Methods for selecting highly functional nucleic acid molecules within cells

ABSTRACT

The present invention relates to a method for selecting a functional nucleic acid molecule functioning in a cell, cytoplasm or nucleus, which comprises constructing an expression vector which comprises, under the control of a promoter, a candidate nucleic acid sequence comprising a randomized sequence in a portion of a nucleic acid molecule or in another nucleic acid ligated to the nucleic acid molecule; introducing the expression vector into a cell; culturing the cell to express the nucleic acid molecule; collecting and destroying the cell to prepare an extract of an entire cell, a cytoplasm fraction or a nucleus fraction; incubating the extract of an entire cell, the cytoplasm fraction or the nucleus fraction for a certain period of time; and obtaining a nucleic acid molecule remaining in the extract of an entire cell, the cytoplasm fraction or the nucleus fraction, and to a novel functional nucleic acid molecule selected by this method.

FIELD OF THE INVENTION

[0001] The present invention relates to a method for selecting a nucleic acid molecule, which is highly stable and highly functional within cells, and also to a novel functional nucleic acid molecule selected by the method.

BACKGROUND OF THE INVENTION

[0002] At the beginning of 1980, the ribozyme (ribonucleotide acid+enzyme) which is an RNA having a catalytic function, was found by the discovery of a self-splicing phenomenon in tetrahymena rRNA (K. Kruger, P. J. Grabowski, A. J. Zaug, J. Sands, D. E. Gottschling, T. R. Cech, Cell, 31, 147-157(1982)), and by the analysis of ribonuclease P which is a complex enzyme of RNA and protein (C. Guerrier-Takada, K. Gaydiner, T. Marsh, N. Pace, S. Altman, Cell, 35, 849-857(1983)). Thereafter, various ribozymes have been discovered (R. H. Symons, Trend. Biochem. Sci., 14, 445-450 (1989); R. H. Symons, Annu. Rev. Biochem., 61, 641-671 (1992); J. Bratty, P. Chartrand, G. Ferbeyre, R. Cedergren, Biochim. Biophys. Acta, 1216, 345-359(1993); Hasehoff. J and W. L. Gerlach, Nature, 334, 585-591 (1988); C. J. Hutchins, P. D. Rathjen, A. C. Forster, R. H. Symons, Nucleic Acids. Res., 14, 3627-3640 (1986)). As in the discovery of the reverse transcriptase, the intron and the RNA editing, the discovery of the ribozyme shook the idea of an established central dogma. At the same time, RNA which had not been recognized as anything more an informational intermediary, began, due to its two roles i.e. (genetic) information and (catalytic) function, to attract attention as a central molecule in the “RNA world” hypothesis, which places the RNA molecule as the tru origin of life (G. F. Joyce, Nature, 338, 217-224 (1989); N. R. Pace, T. L. Marsh, Origins of Life, 16, 97(1985); A. Lazcano, R. Guerrero, J. Oro, J. Mol. Evol., 27, 283 (1988); L. E. Orgel, Nature, 358, 203(1992); R. F. Gesteland, J. F. Atkins, The RNA World, Monograph 24, Cold Spring Harbor Laboratory Press, Plainview, N.Y. (1993)).

[0003] The hammerhead ribozyme is one of the most well studied ribozymes until now (FIG. 1a). This is a ribozyme which functions in nature as a self-cleaving reaction (a cis-form) (T. R. Cech, Annu. Rev. Biochem., 59, 543 (1990); A. C. Foster, R. H. Symons, Cell, 49, 211 (1987); A. C. Jeffries, R. H. Symons, Nucleic Acids Res., 17, 1371(1989)), and as the result that the ribozyme was divided into two strands of RNA (i.e., a substrate binding region and an enzyme activity retaining region) by Uhlenbeck et al. and Haseloff and Gerlach et al (O. C. Uhlenbeck, Nature, 328, 596(1987); J. Hasehoff, W. L. Gerlach, Nature, 334, 585 (1988), the possibility of applying ribozymes to gene therapy was suggested (FIG. 1b). Thereafter, a large number of applied researches which targeted cancers, AIDS, etc., have been reported (M. Cotten, M. L. Bimstiel, EMBO J8, 861 (1989); N. Sarver, E. Cantin, O. Chang, O. Lande, D. Stephens, J. Zaia, J. Rossi, Science 247, 1222 (1990); M. Homann, M. Tzortzakari, K. Rittner, S. Sczakiel, M. Tabler, Nucleic Acids Res 21, 2809 (1993); R. C. Mulligan, Science 0, 926 (1993); S. Altman: Proc. Natl. Acad. Sci. USA, 90. 10898 (1993); P. Marschall, J. B. Thompson, F. Eckstein, Cell. Mol. Neurobiol., 14, 523 (1994); S. M. Sullivan, J. Invest. Dermatol., 103, 85 (1994); F. H. Cameron, P. A. Jennings, Antisense Res. Dev., 4, 87 (1994); L. Q. Sun, D. Warrilow, L. Wang, C. Witherington, J. Macpherson, G. Symonds, Proc. Natl. Acad. Sci. USA, 91, 9715 (1994); R. E. Christoffersen, J. J. Marr, J. Med. Chem. 38, 2023 (1995); G. Ferbeyre, J. Bratty, H. Chen, R. Cedergern, Gene 155, 45(1995); M. Kiehntopf, E. L. Eaquivel, M. A. Brach, F. Herrmann, J. Mol. Med., 73, 65(1995); J. D. Thompson, D. Macejak, L. Couture, D. T. Stinchcomb, Nat. Med. 1, 277 (1995); T. Tuschl, J. B. Thomson, F. Eckstein, Curr. Opin. Struct. Biol. 5, 296 (1995)).

[0004] A ribozyme binds to a substrate RNA by forming a complementary base pair with the substrate. Thereafter, in the presence of a magnesium ion which is essential for the reaction, cleavage of the substrate RNA molecule occurs. As a substrate is recognized by a ribozyme via formation of appropriate base pairs between substrate-binding regions of Stem I and Stem III of the ribozyme and substrate sequences corresponding to them, the substrate specificity of the ribozyme is very high. With very high substrate specificity, any side effect hardly occurs in a cell, and consequently this will provide a considerable benefit when ribozyme is used as a gene expression inhibitor. However, it turns out that the ribozyme does not necessarily effectively function in a cell. The reasons for this are considered to be the reduction of stability or activity of a ribozyme itself in a cell as well as problems regarding expression efficiency or localization thereof. Since various types of proteins interact with a ribozyme or target RNA in a cell, RNA cleavage reaction of a ribozyme is inhibited. Further, it is also very possible that the presence of ribonuclease exerts an influence upon stability and activity of a ribozyme. To overcome such problems, stabilization of a ribozyme by chemical modification was initially proposed, but in terms of cytotoxicity, this means was not so effective (F. Eckstein, D. M. J. Lilley (eds.), Nucleic Acids and Molecular Biology, vol. 10 (1996); L. Beigelman, J. A. McSwiggen, K. G. Draper, C. Gonzalez, K. Jensen, A. M. Karpeosky, A. S. Modak, J. Matulic-Adamic, A. B. DiRenzo, P. Haeberli, D. Sweedler, D. Tracz, S. Grimm, F. E. Wincott, V. G. Thackray, N. J. Usman, B. Biol. Chem. 270, 25702 (1995)).

[0005] At present, there is mainly applied a method in which a gene encoding a ribozyme is introduced in the form of a plasmid and transcribed in a cell, and since a ribozyme can be introduced into a cell in the state of a stable DNA molecule by this method, continuous production of ribozymes can be expected using a transcription system in a cell. Moreover, since no chemical modification or analogous means is required, there is almost no toxicity to a cell.

[0006] To produce an intracellularly effective ribozyme, however, it is required to develop an expression system which is excellent in respect of the following 4 properties of a ribozyme: (1) transcriptional level, (2) intracellular stability, (3) intracellular localization, and (4) activity of the ribozyme itself. As a ribozyme expression system, the polymerase II (pol II) system, which is an mRNA transcriptional system, has previously often been used. However, the present inventors have adopted the tRNA^(val) promoter for polymerase III (pol III) system in expression of ribozymes. The transcriptional level of a ribozyme in this system is 2 or 3 orders of magnitude greater than that in pol II (M. Cotton, M. L. Birnstiel, EMBO J., 8, 3861 (1989)), and in the pol III system, a redundant additional sequence(s) does not readily attach to a ribozyme. In this case, ribozyme is transcribed in a form where it is ligated to 3′ of tRNA via a linker. This linker, which ligates between tRNA and a ribozyme, has a great influence not only on the higher order structure of a ribozyme but also on the stability thereof (S. Koseki, T. Tanabe, K. Tani, S. Asano, T. Shioda, T. Shimada, Y. Nagai, J. Ohkawa, K. Taira, J. Virol., 73, 1868 (1999)). However, the optimal length and sequence of a linker are completely unknown, and consequently how to select a linker sequence still depends on trial and error, and intuition.

[0007] The intracellular localization of a ribozyme expressed by the tRNA transcriptional system varies depending on differences in higher order structure of the ribozyme. After an mRNA, which is a target of a ribozyme, is transcribed as a precursor comprising an intron, it is matured by splicing and immediately transporting to the cytoplasm. It is thought that spliceosomes or many RNA-binding proteins adhere to an immature mRNA in the nucleus, and so it is considerably difficult that ribozyme targets an mRNA in the nucleus (S. Koseki, T. Tanabe, K. Tani, S. Asano, T. Shioda, T. Shimada, Y. Nagai, J. Ohkawa, K. Taira, J. Virol., 73, 1868 (1999)). Therefore, it is more effective for a ribozyme to target an mRNA existing in cytoplasm, and if it is possible that a transcribed ribozyme moves into cytoplasm and coexists with a target mRNA therein, a higher effect can be expected. It was found that, among ribozymes expressed by the tRNA transcription system, only a ribozyme having a clover leaf structure, whose secondary structure of the first half of tRNA is similar, although not totally identical, to the original structure of tRNA, moves efficiently to cytoplasm, while a ribozyme whose structure is completely destroyed is accumulated in the nucleus (S. Koseki, T. Tanabe, K. Tani, S. Asano, T. Shioda, T. Shimada, Y. Nagai, J. Ohkawa, K. Taira, J. Virol., 73, 1868 (1999)). It has been confirmed that all the ribozymes having a structure similar to tRNA ever produced move into the cytoplasm and show effective activity (H. Kawasaki et al., Nature, 393, 284 (1998); T. Kuwabara et al., Nature Biotechnol., 16 961 (1998); T. Kuwabara et al., Mol. Cell, 2, 617 (1998); T. Kuwabara et al., Proc. Natl. Acad. Sci. USA, 96 1886 (1999)).

[0008] The higher order structure of a ribozyme ligated to tRNA is regulated by the sequence and length of a linker, but no tertiary structure thereof has been analyzed yet, and no mechanism regarding the transportation of the ribozyme into the cytoplasm has been found either. Thus, there has not yet been obtained a ribozyme ligated to tRNA, the higher order structure of which is regulated by an optimal linker, and which is far efficiently transported to the cytoplasm.

SUMMARY OF THE INVENTION

[0009] Under such circumstances, the present inventors have attempted to construct a system for screening a ribozyme which has an optimal linker sequence and excellent stability and localization, from any cell.

[0010] The object of the present invention is to provide a method for selecting a functional nucleic acid molecule which is stable within cells.

[0011] The present invention is summarized below.

[0012] (1) A method for selecting a functional nucleic acid molecule functioning in a cell, cytoplasm or nucleus, which comprises: constructing an expression vector which comprises, under the control of a promoter, a candidate nucleic acid sequence comprising a randomized sequence in a portion of a nucleic acid molecule or in another nucleic acid ligated to the nucleic acid molecule; introducing the expression vector into a cell; culturing the cell to express the nucleic acid molecule; collecting and destroying the cell to prepare an extract of an entire cell, a cytoplasm fraction or a nucleus fraction; incubating the extract of an entire cell, the cytoplasm fraction or the nucleus fraction for a certain period of time; and obtaining a nucleic acid molecule remaining in the extract of an entire cell, the cytoplasm fraction or the nucleus fraction.

[0013] (2) The method of (1) above, which further comprises repeating the same cycle one or more times for the obtained functional nucleic acid molecule, provided that the period of time for incubating the extract of an entire cell, the cytoplasm fraction or the nucleus fraction is longer than that in an immediately preceding cycle.

[0014] (3) The method of (1) or (2) above, wherein the nucleic acid molecule is determined for its stability or transcriptional level, using an increase in amount of the existing nucleic acid molecule as an indicator.

[0015] (4) The method of (1) or (2) above, wherein the nucleic acid molecule is determined for its function, using activity of the nucleic acid molecule as an indicator.

[0016] (5) The method of (1) or (2) above, wherein intracellular localization of the nucleic acid molecule is determined, using the existence of the nucleic acid molecule in the cytoplasm fraction or the nucleus fraction as an indicator.

[0017] (6) The method of (1) or (2) above, wherein a linker exists in between the promoter and the nucleic acid molecule, and the linker is randomized.

[0018] (7) The method of any one of (1) to (6) above, wherein the nucleic acid molecule is selected from the group consisting of an antisense RNA, an antisense DNA, an aptamer and a DNA-RNA hybrid.

[0019] (8) The method of any one of (1) to (6) above, wherein the nucleic acid molecule is a nucleic acid enzyme.

[0020] (9) The method of (8) above, wherein the nucleic acid enzyme is a ribozyme.

[0021] (10) The method of (9) above, wherein the ribozyme is a hammerhead ribozyme.

[0022] (11) A novel functional nucleic acid molecule selected by the method according to any one of (1) to (10) above, wherein the functional nucleic acid molecule has an increased transcriptional level, stability or activity within cells or exhibits altered intracellular localization, when compared to a corresponding control nucleic acid molecule

[0023] (12) The functional nucleic acid molecule of (11) above, which is a ribozyme.

[0024] (13) The functional nucleic acid molecule of (12) above, wherein the ribozyme is located in cytoplasm and has high stability and high activity.

[0025] The terms used herein are defined as follows.

[0026] The term “functional nucleic acid” as used herein means a nucleic acid with any biological function found in vivo or within cells, such as an enzyme function, a catalytic function, or a function of biological inhibition or promotion.

[0027] The term “nucleic acid enzyme” as used herein means a nucleic acid with enzyme activity, such as a ribozyme, and a DNA enzyme and a DNA which expresses an enzyme activity by binding to a metal.

[0028] The term “ribozyme” as used herein refers to an RNA having a catalytic function. The term “catalytic function” refers to a function of specifically cleaving a specific site of RNA.

[0029] The term “hammerhead ribozyme” as used herein refers to a ribozyme, which binds to an RNA (especially an mRNA) as a substrate by forming a complementary base pair between the ribozyme and the RNA, and which cleaves the phosphodiester bond at the 3′ of an NUH sequence wherein N represents A, G, C and U, and H represents A, C and U, and wherein any combination may be applied, but GUC can be cleaved most frequently (FIG. 1).

[0030] The term “antisense RNA” or “antisense DNA” as used herein refers to a nucleic acid capable of complementarily binding to a target RNA (especially mRNA) or DNA to inhibit a function thereof. Where the target is an mRNA, the antisense RNA binds to the mRNA and inhibits the translation of the mRNA into a protein.

[0031] The term “aptamer” as used herein refers to an RNA molecule capable of binding to a protein with high affinity. It is considered that an aptamer specifically acting on a pathogenic protein can inhibit the function of the protein in a cell.

[0032] The term “polymerase III” (also referred to as “pol III”) as used herein refers to a promoter suitable for expression of short RNA molecules such as ribozymes, and examples of such a promoter include a tRNA promoter, a retrovirus pol III promoter, an adenovirus VA1 promoter, etc.

[0033] The term “tRNA^(val) promoter” (also referred to as “tRNA^(val)”) as used herein means one of pol III promoters, which is involved in the transcription of a short RNA molecule such as tRNA.

[0034] The term “terminator” as used herein refers to a gene of an mRNA transcription termination portion.

[0035] The term “randomized or “randomization” as used herein means preparing a pool in which any possible set of nucleotides, which are selected from A, T (optionally U), G and C, has been introduced as the nucleotides of a linker sequence.

BRIEF DESCRIPTION OF THE DRAWINGS

[0036]FIG. 1 shows the division of a ribozyme into its substrate region and its enzyme activity-retaining region (FIG. 1a), and the structure of a ribozyme and the cleavage site of a substrate (FIG. 1b).

[0037]FIG. 2 shows the structure of tRNA^(val)-randomized linker sequence-ribozyme (FIG. 2a), and the construction of an expression vector comprising the structure (FIG. 2b).

[0038]FIG. 3 shows a procedure for selecting a highly stable ribozyme. The structure of a control, tRNA-Luc GUA Rz, is the same as in FIG. 2a (SEQ ID NO: 2).

[0039]FIG. 4 shows the results obtained by examining the stability of a ribozyme using an amount of the existing ribozyme as an indicator, by means of hybridization. The greater the number of cycles (i.e., in the order of G-1, G-2 and G-3), the greater the increase in amount of the existing ribozyme, wherein the amount is a relative value as determined when that in the first pool is 1.

[0040]FIG. 5 shows the cleavage activity of a ribozyme towards luciferase mRNA as a substrate. In this figure, pUC-dt represents an expression vector which contains no ribozyme sequence, tRNA-Luc Rz (FIG. 2a) represents a conventional ribozyme (as a control), and both G-1 and G-2 represent ribozymes obtained by the first cycle and the second cycle, respectively.

DETAILED DESCRIPTION OF THE INVENTION

[0041] The present invention provides a method for selecting a nucleic acid which is highly stable and functional in a cell or in the cytoplasm or nucleus thereof. According to one embodiment of the present invention, there is provided a method for selecting, under a certain selection pressure, a highly stable and active ribozyme from a pool of tRNA ribozyme-expression plasmids, in which a linker portion, through which tRNA promoter and ribozyme are ligated, is randomized.

[0042] The nucleic acid molecule selected by the method of the present invention is characterized in that it is highly functional. The term “highly functional” as used herein means that the nucleic acid molecule is excellent in total in respect of properties such as transcriptional level, intracellular stability, intracellular localization and activity. Various types of highly functional nucleic acids can be selected by altering the type and strength of selection pressure. The reason why the function of a functional nucleic acid varies in a cell is likely to be based upon the fact that the nucleic acid may have various types of higher order structures depending on their sequences and thus the nucleic acid-recognizing patterns of various proteins are altered. For example, a nucleic acid which is easily recognized by a nuclease is unstable in a cell; and a nucleic acid recognized by a transport receptor is localized at an appropriate site.

[0043] By examining the nucleotide sequence of a highly functional nucleic acid selected by the present invention, the sequence and characteristics of a higher order structure necessary for efficient function of the nucleic acid within cells can be clarified. Such information is not only useful for development of a highly efficient nucleic acid-expression system directed to current or future applications in individuals, but also useful for the analysis of any protein, pathway or mechanism which determines the fate of a nucleic acid within cells.

[0044] A method for selecting a nucleic acid molecule highly functional within cells will be described hereafter.

[0045] According to the present invention, the method comprises constructing an expression vector which comprises, under the control of a promoter, a candidate nucleic acid sequence comprising a randomized sequence in a portion of a nucleic acid molecule or in another nucleic acid ligated to the nucleic acid molecule; introducing the expression vector into a cell; culturing the cell to express the nucleic acid molecule; collecting and destroying the cell to prepare an extract of an entire cell, a cytoplasm fraction or a nucleus fraction; incubating the extract of an entire cell, the cytoplasm fraction or the nucleus fraction for a given period of time; and obtaining a nucleic acid molecule remaining in the extract of an entire cell, the cytoplasm fraction or the nucleus fraction (FIG. 3).

[0046] A feature of the present invention is that a structurally changeable portion is randomized in a portion of a nucleic acid molecule or in another nucleic acid (particularly, a single-stranded RNA or DNA) ligated to the nucleic acid molecule, and that in vivo screening is carried out based on selection pressures such as stability, transcriptional level, intracellular localization and activity, using any cell. Furthermore, the same cycle as described above is repeated for the obtained desired nucleic acid molecule one or more times, but where the period for incubating the extract of an entire cell, the cytoplasm fraction or the nucleus fraction is set to be longer than that in the immediately preceding cycle, a nucleic acid which is more stable thus more highly active, can be obtained (FIGS. 4 and 5).

[0047] The type of the target nucleic acid molecule of the present invention is not particularly limited. Examples of the target nucleic acid molecule include a nucleic acid enzyme (e.g. a ribozyme, a DNA enzyme, a DNA which expresses an enzyme activity by binding to a metal), an antisense RNA, an antisense DNA, an aptamer, a DNA-RNA hybrid, etc. The preferable nucleic acid molecule is a ribozyme, particularly a hammerhead ribozyme (FIG. 1).

[0048] Examples of randomization include the randomization of a linker sequence existing in between a nucleic acid molecule-encoding sequence and a promoter sequence (FIG. 2a). The length of a linker sequence is, but is not limited to, one or more bases, normally 1 to 1,000 bases, preferably 10 to 100 bases.

[0049] The expression vector comprises a promoter which allows a nucleic acid molecule to express in a cell. The promoter is not particularly limited, but where the nucleic acid molecule is a ribozyme, examples of an applicable promoter include a polymerase II promoter and a polymerase III promoter, preferably a polymerase III promoter, such as a tRNA promoter, a retrovirus pol III promoter and an adenovirus VA1 promoter. Of them, a tRNA promoter is preferable, and a tRNA^(val) promoter (also referred to as “tRNA^(val,,)) or a variant thereof is more preferable. An example of an expression vector comprising tRNAva” is a pUC dt plasmid, which comprises unique EcoR I and Pst I sites and an ampicillin-resistant gene (Amp^(r)) (Koseki, S. et al., J. Virol. 73, 1868-1877 (1999)). Other examples of a vector to construct the expression system include pUC19 (Takara Shuzo), pGREEN LANTERN (Lifetech Oriental), pHaMDR (Human Gene Therapy 6, 905-915 (1995)), an adenovirus vector and a retrovirus vector. The vectors can comprise not only a gene of interest and a promoter, but also elements such as a replication origin, a terminator (i.e., a transcription termination sequence), and a selective marker gene (e.g. an antibiotic-resistant gene or an auxotrophy-complement gene)

[0050] To introduce a vector into a cell, common methods can be used. For example, they include the calcium phosphate method, the electroporation method, the lipofection method, the microinjection method, and the liposome method.

[0051] PCR amplification is carried out using, as a template, a synthesized DNA comprising a randomized sequence in order to obtain an amplified DNA fragment (using conditions as described by Sambrook, J. et al., In “Molecular Cloning A Laboratory Manual”, Cold Spring Harbor Laboratory Press (1989)). In this case, primers may be designed so that a desired restriction enzyme site(s) (which is preferably identical to the unique restriction site of an expression vector) is formed at each terminus of the fragments. The resulting DNA fragments are incorporated into an expression vector treated with the same restriction enzyme(s), and a bacterial host such as Escherichia coli is transformed with this vector. Then, the obtained transformant is incubated (or cultured) in an appropriate medium, and antibiotic-resistant clones are selected to be used as a randomized pool (FIG. 2b).

[0052] Subsequently, using the randomized pool, a mammalian cell (e.g. HeLa cell, Xenopus oocyte, etc.) is transfected, and is then subjected to selection pressure. That is, after culture, the cell is destroyed by a mechanical (e.g. homogenization) or chemical (e.g. hypo-osmosis and lysozyme) method to prepare a cell extract, cytoplasm fraction or nucleus fraction, which is then incubated for a given period of time (e.g. 30 minutes, 60 minutes or 120 minutes). As a result, a nucleic acid molecule with high stability remains. Then, this molecule is collected. In this method, the amount (or transcriptional level) of the existing nucleic acid molecule can be determined from the degree of hybridization (e.g. spot size or density) by hybridizing the nucleic acid molecule with a DNA probe which has a complementary sequence to the nucleic acid molecule. Where the nucleic acid molecule is a nucleic acid enzyme, the degree of its function can be confirmed by determination of the enzym activity. The enzyme activity of a ribozyme can be evaluated by determining the cleavage activity using an activity of a substrate RNA as an indicator (Kuwabara, T. et al., Proc. Natl. Acad. Sci. USA 96, 1886-1891 (1999)). That is, as the activity of a substrate RNA is lower, the cleavage activity is higher. The localization of a nucleic acid molecule can be determined by examining the existence of the nucleic acid molecule in the cytoplasm or nucleus fraction (e.g. by hybridization method).

[0053] In the method of the present invention, the above-described cycle, which comprises the incorporation of a nucleic acid molecule into a vector, the transfection of a mammalian cell with the vector, and the preparation and incubation of the extract of an entire cell, the cytoplasm fraction and the nucleus fraction, can be repeated one or more times for the obtained functional nucleic acid. It is more likely that stable and functional nucleic acid molecules can be obtained by repeating the cycle (FIGS. 4 and 5). In this case, it is better that the incubation period takes longer than that in the immediately preceding cycle. For example, if the incubation period in the previous cycle is 30 minutes, it can be 60 minutes in the present cycle, and 120 minutes in the next cycle.

[0054] Intracellular stability has previously been studied, and it has been confirmed that an intracellularly stable ribozyme has a higher activity than unstable one (S. Koseki et al., J. Virol., 73, 1868 (1999)). The early clarification of the correlation between the higher order structure of a ribozyme and the stability thereof has been desired in order to design a highly functional ribozyme, but it is still completely unknown. However, the correlation between the higher order structure and the stability can be clarified by prediction of the secondary structure of a highly stable ribozyme using the Zuker method (Zuker, M. et al., Nucleic Acids Res. 9 133-148 (1981)), the ribozyme being screened within a cell by the method of the present invention. According to this method, a stable ribozyme having high activity can easily be designed by the prediction of secondary structure.

[0055] With regard to the intracellular localization of a ribozyme, more efficiently transported ribozyme can be selected by screening within a cell. For example, a ribozyme which is more efficiently transported to the cytoplasm can be selected by adding an excess amount of a plasmid which encodes RNA competing with transportation of a ribozyme, and saturating the transportation pathway of a ribozyme.

[0056] Since the screening can be carried out in vivo in the present invention, it is ensured that the actually selected ribozyme acts effectively in a cell. Furthermore, since there have been almost no examples regarding that how to ligate an expression system with a ribozyme has been focused on and studied, and in particular, this study directed to the optimization of a linker is the first attempt. If expression system for more highly functional ribozymes can be designed with the characteristics of a ribozyme obtained by the screening, and it is clarified that the ribozyme has a higher activity than the conventional expression systems, it can be applied for an individual.

[0057] The present invention further provides a novel nucleic acid molecule, which is highly stable and functional within cells and is selected by the above method. An example of the nucleic acid molecule is a ribozyme. The present invention is characterized in that the intracellular transcriptional level, stability or activity of the nucleic acid molecule increases, or the intracellular localization thereof is altered in comparison with a corresponding control nucleic acid molecule. An example of such a molecule is a ribozyme which is located in cytoplasm and has high stability and high activity. Any novel nucleic acid molecule having the above characteristics, which can be selected by the method of the present invention, is included in the present invention.

EXAMPLES

[0058] The present invention will be more specifically described in the following examples. The examples are not intended to limit the scope of the invention.

[0059] Example 1

Construction of Initial Pool and Screening of Ribozyme with Good Stability

[0060] According to the known method (Kawasaki, H. et al., Nature 393, 284-289 (1998); Kuwabara, T. et al., Mol. Cell 2, 617:627 (1998)), expression vectors were constructed by inserting a ribozyme sequence at a site downstream of a tRNA^(val) promoter via a linker with randomized sequence (FIGS. 2a and 2 b). This initial (or first) pool contained a large number of ribozyme expression vectors which comprise linkers with different nucleotide sequences.

[0061] An aliquot of the expression vectors was transfected to HeLa S3 cells, using a transfection reagent (Trans 1T-LT1 reagent; Pan Vera, Madison, Wis, USA). After 24 hours, the cells were collected and an extract of the cells was prepared. The obtained extract was kept at 37° C. for a certain period of time, to select ribozymes having high stability by using resistance to nucleases in a cell as an indicator.

[0062] After purification of ribozymes, double stranded DNAs encoding ribozyme sequences which were selected by RT-PCR were obtained. These obtained DNAs were inserted into plasmids to prepare ribozyme expression vectors. After cloning, a portion of ribozymes was subjected to sequencing to confirm the sequence of a randomized portion. A series of procedures are shown in FIG. 3.

[0063] After transfecting the obtained expression vectors into HeLa S3 cells again, the collection of cells and preparation of an extract were carried out in the same procedures as described above. The extract was kept at 37° C. for twice as long as the above time in order to select more stable ribozymes. After purification of ribozymes, double stranded DNAs were obtained by RT-PCR, and the DNAs were inserted into plasmids to prepare ribozyme expression vectors again.

[0064] In the next cycle, the incubation period was extended to twice as long as the time in the previous cycle (4 times the period in the first cycle), and the same procedures were performed.

Example 2 Stability of Selected Ribozyme

[0065] The stability of the selected ribozymes was evaluated as follows. An aliquot of expression plasmids in a bulk (in which ribozyme expression vectors were mixed), which encode ribozymes (generations 1 to 3, or G-1 to G-3) selected in each cycle, was transfected into HeLa S3 cells, and after about 24 hours, the existing amount of ribozymes was determined by Northern hybridization. Because the more stable the ribozyme is in a cell, the longer the time required for decomposition of a ribozyme by nucleases, the existing amount of a stable ribozyme is likely to be larger. RNA extracted from cells was subjected to agarose gel electrophoresis and transferred to a nucleic acid adsorption membrane. Then, a DNA probe (5′-aca tca cgt ttc ggc ctt tcg gcc tca tca gcg cgg aat-3′; SEQ ID NO: 1) having a sequence complementary to the 5′-labeled ribozyme was hybridized on the membrane (hybridization conditions: Ambion ULTRAhyb buffer, 42° C., overnight). After that, the existing ribozyme was detected using STORM830 image analyzer (Molecular Dynamics). As a result, it was found that the existing amount of the ribozyme increased as generations proceeded (FIG. 4). This is because more stable ribozyme was selected by each turnover of the selection cycle. By this method, it was confirmed that the present invention effectively works in selection of an highly functional nucleic acid within cells.

Example 3

[0066] Target RNA-cleavage activity of selected ribozyme in cell

[0067] Since the ribozyme used in the present invention targets a luciferase gene, whether the increase of stability raises the activity of a ribozyme was confirmed by examining the luciferase mRNA- cleavage activity of the ribozyme obtained in each generation (Kuwabara, T. et al., Proc. Natl. Acad. Sci. USA 96, 1886-1891 (1999)). First, the ribozyme expression plasmids with high stability obtained in the first pool, G-1 and G-2 were transfected in a bulk state into the transformed cell derived from HeLa S3 cells, having a luciferase gene incorporated into its genome. After a predertermined period of time for culture, the effect of a ribozyme was evaluated using luciferase activity in an extract of the cell as an indicator. As a control, the activity of a tRNA ribozyme targeting a luciferase gene was also evaluated. As a result, it was found that the luciferase mRNA cleavage effect of a ribozyme became remarkable as generations proceeded and showed an inhibition effect of about 80% in generation 2 (FIG. 5). From this result, it was confirmed that the present invention was excellent in stability and was effective for selection of a ribozyme highly active in a cell.

[0068] Industrial Applicability of the Invention

[0069] The method of the present invention can be used for selection of a nucleic acid molecule with high activity, which functions effectively in a cell. Furthermore, a ribozyme expression system with high efficiency can be developed based on a linker sequence which a highly functional ribozyme has, and this expression system can be applied for treatment of malignant diseases such as cancers and infectious diseases including AIDS. The present invention is effective for selecting not only ribozymes but also various types of functional nucleic acids, and enables a wide range of applications by altering types of nucleic acids to be used and/or types of selection pressure.

1 154 4047 base pairs nucleic acid double linear Genomic DNA 1 AAGCTTTTTG CTTTCCTTCC CCGGGAAAGG CCGGGGCCAG AGACCCGCAC TCGGACCAGG 60 CGGGGGCTGC GGGGCCAGAG TGGGCTGGGG AGGGCTGGGA GGGCGTCTGG GGCCGGCTCC 120 TCCAGGCTGG GGGCCGCCAG CTCCGGGAAG GCAGTCCTGG CCTGCGGATG GGGCCGCGCG 180 TGGGGCCCGG CGGGGCGGCC TCGGGAGGCG TCCAGGCTGC GGGAGCGGGA GGAGCGGCCG 240 TGCGGGCGCC AGCGCCGTGG GTGGAGGTCG CCGTCCCTCC TGAGGGGCAG CCAGTGCGTT 300 TGGGACCCGG GAGCAGAGCC CGCGCCTCCC CAGCGGCCTC CCCGGGGGTC TCACCGGGTC 360 ACCCGAGAGC GGAGGCCCCG GCTCCGCAGA AACCCGGGGC GGCCGCGGGG AAGCAGCGCC 420 CTCAGGCGTC GGAGGAGCCC CCAGAAGGAC CTCGCGCCTT CCCGCCGGGC TCCGACCGCC 480 TGGGTTCGGT GCGGGACGGC CCAGGCCGCC AGGACCCCCA AGCGCAGCTC AGTCTGCGGG 540 GCACGACCCA GAGGCCAGCA GCAGAGGACG GGGCCGGGGC CGGGAGAGGG CGGGGAGGGC 600 GCTCCTGGGA GGTCAAGGCC AGGGCTAGAC TTTCAGGGTC ATGGCCTGGC CCCTCATCCC 660 CAGGGAGGTG AGGGGGCTCT GTGAGCAGAG GGGGCCCCGG TGGAGAAGGC GCTGCTAGCC 720 AGGGGCGGGG CAGGAGCCCA GGTGGGGACT TAAGGGTGGC TGAAGGGACC CTCAGGCTGC 780 AGGGATAGGG AGGGAAGCTA GGGGTGTGGC TTGGGGAGGT GCTGGGGGAC CGCGGGCGCC 840 CTTTATTCTG AAGCCGAATG TGCTGCCGGA GTCCCCAGTG ACCTAGAAAT CCATTTCAAG 900 ATTTTCAGGA GTTTCAGGTG GAGACAAAGG CCAGGCCCAG GTGAAAATGT GGCAGTGACA 960 GAGTATGGGG TGAGAACCAC GGAGAGAGGA AGTCCCCGAG GCGGATGATG GGACAGAGAG 1020 CGGGGACCAG AATTTTTTAA AACGCATCTG AGATGCGTTT GGCAGACTCA TAGTTGTTTT 1080 CCTTTCACGG AGAAAGTGTG GGCAGAAGCC AGCTCTAAAG CCCAGGCTGC CCAGCCTGCA 1140 CTGGCAGAGC TGACGGAAGG CCAGGGCAGA GCCTTCCCTC CCTGTCACAG ACATGAGCCC 1200 TGGAGATCTG GAATGAGGCA GATGTGCCCA GGGAAAGCTG ATCCGCCCCG ACCCAGGGCC 1260 CCCCGGGTGC CCCTTTGAGC GTGGAATCGT TGCCAGGTCA TGGCTCCCTG CTATCGAACA 1320 CCGGACACGG GTCGTGTGCT GCACCTGGCA GTTGCAGGAC CGACACCCAC AATGCCTTAA 1380 GAGGTGATGA CTGCCTTCCA GGGGCCTGGC TGGCTGACAC TTTGCATGGC TCCTGGAGAA 1440 GAGGGATTGA GTGGAGTCCA CGGGTCATGG CCACGTCCTG GGTGCTGCCT CTGAGGCAGG 1500 GCCCGGCTGG GGTGAGAAGG GGCTGGAGAC AGGTTCCTGC CAGTTCAGCC TCTAACCGGT 1560 GGTCTTCATG CCTAGGAACC CACTGGGGGC TTATGAAACT GCAGGTGGCT GAGTCCTTGC 1620 CATGGGGTCT CTCCTTCAGG AGGTCTGGGT GGGGCCGGAG ACTGTACCCC ACAAAGGGTC 1680 CCAGGTGAGG CGGATGTGGC CTGGCGCTGT GTGGCTCTGG ACCTAGTCCT TGGGCTTGGG 1740 CTGGCGCCCA GGGCCTGGGC TTGAGACAGC TGTGACGCAG GCAAGCCATT TACCCCGTTT 1800 GTGGGGACAT TACATCTTCC TAGCTTGGAA CACACAGGCA GCCAGGGTTG TTATCCACAT 1860 TCCTCCTCCA TGTTCTTCTC TTGAGAACTT TTACCAGGTA TGTCAGGAGC TGGGCTCCAC 1920 CAGGGAGACT CAAGTGGAAA GCCCTCATCC TTGTCCTCCA GGAGACAGGA AAACCTATGG 1980 TTACAATTCC AGGGACAAGA GCGATGCATG TGAGGTGTGG CAAATCTCAC TGTTCAACTG 2040 GAGAAATCAG AGACAGCTTC CTGGAGGCAG TGACACCTGG ACAGGCTTCT CCACAGGAGG 2100 AAGCGAGTGA GAGAAGCCAA CTGGGATGGA CCCATCATGT AGGGGGAACA GTGCGCGCAG 2160 AACCAACAAC CACCCCCACC CTAGGCCCAG AGCTCACGGA GAGAGCTGGG CCTCTCGGGG 2220 TGACTACATA GTTCCCTGCT GGATCTTAGG TCTTGTCCTT GGGCAGCTCT GCTGAGACCT 2280 CTATGCCTGT TCCAGGCTGC ACCAAGGTTT TGTGACTATT GGTCTGGGGT TGTTTTGCAG 2340 CAACTGAAGT GTTCTGTTGT AAAACAGGCA CTTGATTTGC TGGAAGGAAT GCTGTTTGTT 2400 CTTGCTGCGA CAAACATTGA GCAGCATTTA GTGGGCGGTT TATATCTTGT GGAGTAATGG 2460 GTGTTTTTGA AGTCTGTCCT GGGTACTGCA CATTAAAAGG AATATCATTT TCTGAAACAT 2520 TGCTATTTTC CACACCAGAA ATCATATCCT CTTGCTGGTC CATGTCTGAA GACCTTACAC 2580 GAGAAAGTCT TAATGTAAGT TTAGTAGAGT CCTTGGATGG AGAACTAATT ATATCATACA 2640 TTGCCGCTTT CTCACTCTGC TCTTTTTCAT CCTTGCCTAA TTTCATTTTC TTCTGCTTCT 2700 TTTGTTTTCT TTCTGGAGAA TCTAGCAAGA TATCTGGTGG AACATCTCGA GGTGATGAAC 2760 AAGGTAGAGA CTGAGATTGT AGGATTAAAG GTGGTCTTGA GCCTTTAGGA GTTCCTTCAC 2820 TTCCAGCAGG GGAGCATACT GGCTGTGGAG ATCTCAAGGG AAAAGATGCA GCATTCCTCA 2880 TTGTTGAAGA ATCTCCATCG TCACTACTTA GCCTGTGCAC CATGTGTAGG TAGTCCTCAC 2940 TTGAACCATG TCTAGGATTA TCAGCATGAT GATTAGCTGA ATTGCCAGAC AACGGACCAG 3000 AAACTTTATT ATCATGTATG TTTCTCAAAC CACCTGCAAC AATGGGACTT GATACCGATG 3060 CTTGTTGCAT CTGTGGATGT GTTGTGTAAC TTGAAGGATG GGAATATGGC ATGTATCCTG 3120 CAGGGCTTTG TGGGGCGTAT GGACTAGGCA CTGGGCTATT TTGCTGTGGC ATAAATCTGT 3180 TCCCAGAGCT TGTCTGTGGT GGCACAAACC GGCTGGAGGG GCTATGTGAG ATAGTGGTTT 3240 GTTGATAATT GGAAGATGCA GGACTACTGT GCATGGAATT CTGAGAAAGT TTATACTGAG 3300 ACATCATCAT TCCACTTTGT ACATATCTGT TCTGCATGCT TTTCTCCCTG AAAACATTAG 3360 GACTCCTTGC CAGGACGGCC TGCAACAAGA CTGGTATGTC ACCTTCTGGG TCATCACTGC 3420 CAAGGTTATC TTTCAACTCT ATGTGATCTG TTGATACCTG GTTGAGGCTA TGGACAAGCT 3480 GTGAAACCAA ATTGTCATCC CTACAAGCCA AAAGGCAGTT CACCTCTTCT GCTATTCGTG 3540 CATTAAAGAG AAGGCTCTTT GTAGTTGTAG CAGGTAAAGG AGATGGAAGA GGCAGCTGGT 3600 TCAGGAGGTC TGTGAGACTA GCAATCCCCG CAAGAGTAGT AATGGGGACA TGGGGCATAT 3660 CCCCATTCAT CCTGAATTTC TGGAATGGTG TTGCCTATAA AAGTACTTAG TTCAGGTGCC 3720 AGCTGTCATT ACTTCCCATT TCCCAAACAC TGGGCGAATC GGCGTCTGAA TCCAAGGGGA 3780 GGCCGAGGCC GCTGTGGCGA GAGACTATAA TCCGGGCCGG GAGGGGGGGC GGCTACGGCT 3840 CCTCTTCCGT CTCCTCAGTG CGGGGAACAT GTAGAGCCGG GGGGAGACCA GCCGAGAAGA 3900 CAAATCGTTG CTTCTTCTTC CTCCTCCTCC TCCTTCTCCC ACATAGAAAC ACTCACAAAC 3960 ACCCGACCAC GGGCCCGAGC TACCGGGGGG GCATCGCCGC GGGCCCGGGA ACCAATTCTC 4020 CTGTCGGCGG GGGCGTCCTT TGGATCC 4047 739 base pairs nucleic acid double linear Genomic DNA 2 GGATCCAAAG GTCAAACTCC CCACCTGGCA CTGTCCCCGG AGCGGGTCGC GCCCGGCCGG 60 CGCGCGGCCG GGCGCTTGGC GCCAGAAGCG AGAGCCCCTC GGGGCTCGCC CCCCCGCCTC 120 ACCGGGTCAG TGAAAAAACG ATCAGAGTAG TGGTATTTCA CCGGCGGCCC GCAGGGCCGG 180 CGGACCCCGC CCCGGGCCCC TCGCGGGGAC ACCGGGGGGG CGCCGGGGGC CTCCCACTTA 240 TTCTACACCT CTCATGTCTC TTCACCGTGC CAGACTAGAG TCAAGCTCAA CAGGGTCTTC 300 TTTCCCCGCT GATTCCGCCA AGCCCGTTCC CTTGGCTGTG GTTTCGCTGG ATAGTAGGTA 360 GGGACAGTGG GAATCTCGTT CATCCATTCA TGCGCGTCAC TAATTAGATG ACGAGGCATT 420 TGGCTACCTT AAGAGAGTCA TAGTTACTCC CGCCGTTTAC CCGCGCTTCA TTGAATTTCT 480 TCACTTTGAC ATTCAGAGCA CTGGGCAGAA ATCACATCGC GTCAACACCC GCCGCGGGCC 540 TTCGCGATGC TTTGTTTTAA TTAAACAGTC GGATTCCCCT GGTCCGCACC AGTTCTAAGT 600 CGGCTGCTAG GCGCCGGCCG AAGCGAGGCG CCGCGCGGAA CCGCGGCCCC CGGGGCGGAC 660 CCGCGGGGGG GACCGGGCCG CGGCCCCTCC GCCGCCTGCC GCCGCCGCCG CCGCCGCGCG 720 CCGAAGAAGA AGGGGGAAA 739 233 base pairs nucleic acid double linear Genomic DNA 3 CAAGAGTGGC GGCCGCAGCA GGCCCCCCGG GTGCCCGGGC CCCCCTCGAG GGGGACAGTG 60 CCCCCGCCGC GGGGGCCCCG CGGCGGGCCG CCGCCGGCCC CTGCCGCCCC GACCCTTCTC 120 CCCCCGCCGC CGCCCCCACG CGGCGCTCCC CCGGGGAGGG GGGAGGACGG GGAGCGGGGG 180 AGAGAGAGAG AGAGAGAGGG CGCGGGGTGG CTCGTGCCGA ATTCAAAAAG CTT 233 2998 base pairs nucleic acid double linear Genomic DNA 4 GGATCCAAAG AATTCGGCAC GAGGTAGTCA CGGCTCTTGT CATTGTTGTA CTTGACGTTG 60 AGGCTGGTGA GCTTGGAAAA GTCGATGCGC AGCGTGCAGC AGGCGTTGTA GATGTTCTGC 120 CCGTCCAGCG ACAGCTTGGC GTGCTGGGCG CTCACGGGGT CCGCATACTG CAGCAGGGCC 180 TGGAACTGGT TGTTCTTGGT GAAGGTGATG ATCTTCAACA CTGTGCCGAA CTTGGAGAAA 240 ATCTGGTGCA GCACATCCAG GGTCACAGGG TAGAAGAGGT TCTCCACGAT GATCCTGAGC 300 ACGGGGCTCT GCCCGGCCAT CGCCATCCCT GCATCCACGG CCGCCGCCGA GGCAGCCAAG 360 GCCAGGTTCC CCGACTGGAC CGAGTTCACC GCCTGCAGGG CCGCCTGGGC CCGCGCCTGG 420 TTGGGAGAGC TGTCGGTCTT CAGCTCCTTG TGGTTGGAGA ACTGGATGTA GATGGGCTGG 480 CCGCGCAGCA CAGGGGTCAC CGAGGTGTAG TAGTTCACCA TGGTATTGGC AGCCTCCTCC 540 GTGTTCATCT CGATGAAGGC CTGGTTTTTC CCCTTCAGCA TCAGGAGGTT GGTGACCTTC 600 CCAAAGGGCA GCCCCAGGGA GATGACTTCC CCCTCCGTGA CGTCGATGGG GAGCTTCCGG 660 ATGTGGATCA CTCTAGAGGG GACGCCTGCA CTTCGGCTGT CACCTTTGAA CTTCTTGCTG 720 TCATTTCCGT TTGCTGCAGA AGCCGAGTTG CTGCTCATGA TAAACGGTCC GTTAGTGACA 780 CAAGTAGAGA AAAGCTCGTC AGATCCCCGC TTTGTACCAA CGGCTATATC TGGGACAATG 840 CCGTCCATGG CACACAGAGC AGACCCGCGG GGGACGGAGT GGAGGCGCCG GAATCCTGGA 900 GCTAGAGCTG CAGATTGAGT TGCTGCGTGA GACGAAGCGC AAGTATGAGA GTGTCCTGCA 960 GCTGGGCCGG GCACTGACAG CCCACCTCTA CAGCCTGCTG CAGACCCAGC ATGCACTGGG 1020 TGATGCCTTT GCTGACCTCA GCCAGAAGTC CCCAGAGCTT CAGGAGGAAT TTGGCTACAA 1080 TGCAGAGACA CAGAAACTAC TATGCAAGAA TGGGGAAACG CTGCTAGGAG CCGTGAACTT 1140 CTTTGTCTCT AGCATCAACA CATTGGTCAC CAAGACCATG GAAGACACGC TCATGACTGT 1200 GAAACAGTAT GAGGCTGCCA GGCTGGAATA TGATGCCTAC CGAACAGACT TAGAGGAGCT 1260 GAGTCTAGGC CCCCGGGATG CAGGGACACG TGGTCGACTT GAGAGTGCCC AGGCCACTTT 1320 CCAGGCCCAT CGGGACAAGT ATGAGAAGCT GCGGGGAGAT GTGGCCATCA AGCTCAAGTT 1380 CCTGGAAGAA AACAAGATCA AGGTGATGCA CAAGCAGCTG CTGCTCTTCC ACAATGCTGT 1440 GTCCGCCTAC TTTGCTGGGA ACCAGAAACA GCTGGAGCAG ACCCTGCAGC AGTTCAACAT 1500 CAAGCTGCGG CCTCCAGGAG CTGAGAAACC CTCCTGGCTA GAGGAGCAGT GAGCTGCTCC 1560 CAGCCCAACT TGGCTATCAA GAAAGACATT GGGAAGGGCA GCCCCAGGGT GTGGGAGATT 1620 GGACATGGTA CATCCTTTGT CACTTGCCCT CTGGCTTGGG CTCCTTTTTC TGGCTGGGGC 1680 CTGACACCAG TTTTGCCCAC ATTGCTATGG TGGGAAGAGG GCCTGGAGGC CCAGAAGTTG 1740 CTGCCCTGTC TATCTTCCTG GCCACAGGGC TTCATTCCCA GATCTTTTCC TTCCACTTCA 1800 CAGCCAACGG CTATGACAAA ACCACTCCCT GGCCAATGGC ATCACTCTTC AGGCTGGGGT 1860 GTGCTCCCTG ACCAATGACA GAGCCTGAAA ATGCCCTGTC AGCCAATGGC AGCTCTTCTC 1920 GGACTCCCCT GGGCCAATGA TGTTGCGTCT AATACCCTTT GTCTCTCCTC TATGCGTGCC 1980 CATTGCAGAG AAGGGGACTG GGACCAAAGG GGTGGGGATA ATGGGGAGCC CCATTGCTGG 2040 CCTTGCATCT GAATAGGCCT ACCCTCACCA TTTATTCACT AATACATTTT ATTTGTGTTC 2100 TCTAATTTAA AATTACCTTT TCATCTTGCT TGATTTTCCT TCAGCTAAAT TAGAAATTTG 2160 TAGTTTTTCC CCTAAAAAAT TCAATGGCAT TCTTTCTTAT AAATTACATT CTCTGATTTT 2220 CTTGTCAGCC TGCTTCAAGG AAATCCATGT GTTCAAAATG CTTGCTCGCA GTTTGCTCCA 2280 TACCAAATGG TTGCTTAACC CAAATATCTG AGCAGCAAAT TGAGCTGATC CTTCTGGAGA 2340 AAGTACGGTT GAACAGCCAA GACCACTGGG TAGTCGAAGA GAAGACCACA CATCCTGAAC 2400 TCCCCAGTCT GGTGTGAGGG GAGGACAGCT GATAACTGGA TATGCAGTGT TCCCAGACAT 2460 CACTGGTCCC AAACCATTAC TTCTGCCTGC CACTGCCACA AATACAGTAG GAATGCCATC 2520 CCCTTCATAC TCAGCTTTAA TCCTCAGAGT TTCATCTGGT CCTTTATGCG CAGATGTTAC 2580 TCGAAGTTCA CATGGAATGC CAAAATTTCC ACAGGCCTTC TTGATTTTTT CACAGTGACC 2640 AAGATCAGAA GTAGAGCCCA TCAACACTAC AACCCTGCAC TGACTTTCTG ATTTCAAAAG 2700 CAACTCTACT CTCTCTGCAA CCCACTCAAA GTTTTTCTTT ACCATTTGGA GCCCTTCAGG 2760 AGTTACTTCT TTGAGGTCCC GATAAGACTG TTTGTCTTTC TGTTGGCTTC GATCTCCTGA 2820 TGGCCAGAGT CTCCAGGAAT CATTGTCAAT AACATCAGCA AGAACAATTT CTTTGGTGGT 2880 TACATCAACA CCAAATTCAA TCTTCATATC AACCAGTGTA CAATTCTGGG GCAACCAGGA 2940 TTTCTCCAGT ATTTCAAATA TAGCCTGTGT AGCATCTCGT GCCGAATTCA AAAAGCTT 2998 4152 base pairs nucleic acid double linear Genomic DNA 5 AAGCTTTTTG TGAAAACCCT AGGATATGTC CCCTCCCTCA CCACACCCAA CCCCCCGCCC 60 CTGCCCCAGG ACATGACGAT GCCTCACACA CACACACACA CACACATACA CACAAGGCCG 120 TGAGCTGCAC GCAGGAACAT GGGCTGCACT CACGACAACA TTGAAAAAAT ATACATTATA 180 TATGTACACC CGGGGCCCCC ACGTCCCCTC CCGTCCCCGC AGCCTGGCCA CACCAGGTCA 240 CGGAGGAGGG GCCGGGGCTG CAGGACCTCA GGACTGCAAG GGCAGGAAGG GAAACAGGAC 300 AAGAAAGGAA GGAAGTTGGA AAGGAGGGAG AAATGGGGTC CCCAGACTGA AATGGAAATG 360 AGGTGGGGCG ATCATAAGAG AAGCAGGGAC GATGGTCCAG CTGAGGGAGC CCTGCAGAGG 420 GGGAAAAGCT TCCCATGGAC AGGAGAGAGA AGGGAAGGGG AGAGGAGAGG GTTTCCTTCA 480 ATCCCACCCC CAGCCCCAGC CCCAGCCCCA GCCATTGCAA TCGTCACCCT CTCCCCAACA 540 CAGTGAGTGC TAAGGGGGCA GCTGCCATTG GGGGTAGAAA GGCAGCTGAA GTCCAGCCCA 600 CTTTCCAACC CAGCCAGCCC CAGTGCAAGG GGCACACCAG GAGCATGACA GCCCAGAAGT 660 GAGGGATGGG GGGCCGGGGG AGGGGCAGGG CGGACTCCAG AGGGCCCGCT GGGGTTTTGA 720 AATGAAAGGA GGACTGGTTC TGAAGCCTCT CTCCCTCTTG GTCTCTGTGT TCCCAGAAAG 780 TCCTTCTCCC ATGTCTGGAG TGTCTGTTTC ACCAGGGCAG AATTCCCCCT CTGCGTGGGG 840 AGAGGTGTAG GCCTTAGTAG CGGTGTGGGG GGGTCTCGAT GATGCGTCTC TCGTCGCTGC 900 TGGGGGAATC GGCCACCTCC GAGTCACTGC TGTCCTCATC CTCCTGCTGG CCCCCAACAG 960 CCCCCGTCAC ACAGGACTGC CGATTCTGGT AGGACTCCAT GGGGTTCACA ATGATGGTGA 1020 GAGCTGAGTC ATCCCAGAAG AGGTCTGGGT CCTTGGGGTC ACTGGAGGCC CCTGGAGGCC 1080 CGCCGGCCCC TGAGACGCGG CGGTGAAGGG AATGGATGCG CACCAGGCCC AGGACGACCA 1140 TGAGCACCAG GAAGCCCACG CACACCACAA TGATGAGGGT TGCGGCGCTG GGTATCATGG 1200 AGTTTCTGTG GGAGCTGGCT AGGCTGTGTC CAGCCATCTC AGGCGGGGGC TGGTGACCAC 1260 GGTGCAGGAA CTGCTGGGAG CTGAGCACGT GGCTGGGGTG GGCAACCCGG TTCATGCTGT 1320 GCAGGACATT GACCTCCACG ATGAATTCAT TGCTGGAGTA ACGGCCATTC ATTTCCGAGC 1380 AGGAAAGCCG GAACTTCCTG GTGTAGAGGG CAGCTCCGTG TCGCAGCCGA TAACGAGCCT 1440 GCCTCAGGAT CTCTTCATAC ACAGTGATGC TCTCCACCCC AGCAATAGTG AGGTAGGCAG 1500 ATGTGTTGGT GAGCTCCAGC CCCCGCTGCT GCAGAGAGGT TGTGTCCAGG AGCAGGCTTT 1560 CCCGCTCGGG ATCCAGGTCA TCCCCCACCA GAGAAATTTC ACAGCCATCC AGGTTGTGCA 1620 CAATCTCATC CGACATGCGT GTGTCTGTCA CTGTGCCCTG CCAACTCTCA TCCTTTTTGG 1680 CCTCCACCTG GTGAGAAATG GAGCAGGTGA TTTGAAGATC AGGGAACAAA GGGACGCCGT 1740 TGGTTCCCTC AAAGTCCACA GCTGGGCGGG CAAAATGAGC AGTGCCACTC AGCAGGATCT 1800 GGGGGGCGTC AGGCTGAAGG ACGACCACGT AGCCCTCCAC TTCAGGGATG GAGACGCAGG 1860 ACTCTTCGCT GAAGCACTTG ACAGCAGTGG TGAGGCGCAG GGGCCTGACG CCGGGCGTGG 1920 CAAAGCGCAG AGTGTTCATG TAAGCCACAT GCTGCAGGGC ATGGTTGAAG GTCTCCACAT 1980 CATCCCCCTC CAGGGTGAGC AGGGACTGTG AGGGGTTCAC GTGGACCTTC ATGCCTTTGC 2040 CCAGGCTCTC GAAATCCCTA TAGTCCAGCC CCTCCCGACA TGCATAGAGG CACTCGATGA 2100 CCTCGCGGCT CTCCAGGCGA CCTGAGCGCA CGCTGAAACC AGCCAGGTAG CCATGGAAGT 2160 AGTGGTGGAT CGACAAAGGG TCTCCTTGGG TGGTGTCTGT ACTGTTGTCT CCCTTTTCCT 2220 TCTCTTTGTT CTTCTCCTCA GTCCAGCAGG CCCCAATCAT GAGAGCAGGC TCCCTTCGGG 2280 GTGGGTGGAT GAGGCCATTG TCATGGATGA GGGCAGGGTC GAAGGAGATG CCGTCGGTAT 2340 AGAGTGTGAC TGTGGGGAAC TCGAGGTTCA GAGCGTAGTG GTGCCACTCA TCATCACAGA 2400 CCTGCTCCAG CTTCCAGAGG AACTTGACTG GGCGGGCACT CTCAAGCAGG GGCCAGTAGA 2460 GGAAGGCAAT CCTACAGCCG TGGACAGTCA GCGAGTAGTG AGAGAAGCCG TCCTCATTCT 2520 GGACAGTGTT ACATACGATG GTTTCCTCTT CCTTCTTGCC CTTGTTGGGA GTTACGCCAT 2580 GCTTCATCCA GAAGGACAGG GTGAAGTGGT CACTGAGGCT GTCCTGGGGC CCAGAGCCCA 2640 GCCCACTGGG GCCACCCAGG GGCACCTGCA CAGCCTGGGT GCCATTGAAC CAGTAGATCA 2700 GGCTGCTGTC CTGGCTGTAG TGCACCGAGA GTCCTGCTGT CCAGTTGGCA TTGGGGCCAG 2760 GCATGGGCAA CAGATCCACT TCCCCAGTGG CAGCACCACA GAGTTTCCGC AGCGCCCGCT 2820 CTGAGTAGTT GTCACGGTCA CAGCCCTTGG CCACATGGCT GGTCTGCAGC TCTATGGTGG 2880 CCTGAATGTT CCAGAGTGGT TCATCACAGG TCTCCAGGCG GATACCAGGG AACAAAGCCA 2940 AGCTCCCAGC ACCTGGTGCA TATTCGATCC TTTTGTTCCA GCCTTGCCAG CTGGGTTTAC 3000 AGGTGGGCTT CACCTGAATC TCCACCTCAG CATCATCTGC TGCCCGCTTC TTCCCACAGT 3060 CATAAGCTGT CACTGTAAAC TTATAGAGCC TCTCACCACT GTACTGCAGC TTCTCTGTGT 3120 TCTCAATGTT CCCGTCATTG TCAATGAGGA AAGGGGTGTT GGGTGTGAGA ATCTCATAGT 3180 AGCAGATCTG GCTGTACTGG GGGGAGCAGT CACCGTCAAT GGCTTCCACC CGCAGGATGC 3240 GATCGTACAG CTTCCCCTCT GTCACAGCCG CACGATACAG CCGTTCCACA AACACTGGGG 3300 CAAACTCGTT CACATCGTTG ACCCGCACAT GCACAGTGGC CTTGTGGGAC TTCTTGGTGT 3360 TGGCCCCGTC GGGGCCCTCG CCACAGTCAT AGGCCTGGAT GGTGAAGGTG TGTTCCTTCT 3420 GGGCCTCGCA GTCCACAGGC TCCTTGGCCC GGATCAGCCC CTCTCCTGTC GCCTTGTCAA 3480 GGATCACAGC CTCAAAGGGC ACCCCAGACC CATGGAGCCG GAAGCCGCAG ATCTCACCTG 3540 CATAGCGCAG CGGGGCATCC TTGTCCAAGG CAAAGAGTGG TGGATTCAGT AGGACCGTGT 3600 TGTCATTCTC CATGACGATG CCCTGGTACT CTGCCTCAAT CCATGGCTTG TGCTTGTTGG 3660 CTTTGTTACA GGAGCAGGAC GCGAGCAGAG AGGCCAGCAG AAGGGGCAGC AGCAGGAGGG 3720 TCATGGTGCG GCGTGGGGCA GGGCAGGGCC AGGCGTTTGC CTCCCCTGGG AGCCTCCAGC 3780 CTGCGGATTC CACCTTGCGG GAGGGATACA GGGGGGGAAA ACCAAAATAA AACGTCAAAT 3840 AAATTGTGTA GGAGGAGTCC AGCTTAGGAC CGGGCCAGAG CCAGGCCAGG CTCGGGGAGG 3900 GGGCCTCTGC AGGTTCAGAG GATCACTGCT GCCACCACCG CCACCCTGGG AGCCAGTTAT 3960 TTTGCCATGG CCTTGATTGC AACAGCTGCC TCCTCTGTCA TGGCAGACAG CACCGTGATC 4020 AGGATCTCTT CTCCACAGTC GTACTTCTGC TCAATCTCCT TGCCAAGGTC TCCCTCAGGG 4080 AGACGAAGGT CCTCTCGTAC CTCCCCGCTG TCCTGGAGCA GTGATAGGTA CCCATCCTGG 4140 ATCTTTGGAT CC 4152 3117 base pairs nucleic acid double linear Genomic DNA 6 GGATCCAAAG ATTCGGCACG AGTGGCCACA TCATGAACCT CCAGGCCCAG CCCAAGGCTC 60 AGAACAAGCG GAAGCGTTGC CTCTTTGGGG GCCAGGAACC AGCTCCCAAG GAGCAGCCCC 120 CTCCCCTGCA GCCCCCCCAG CAGTCCATCA GAGTGAAGGA GGAGCAGTAC CTCGGGCACG 180 AGGGTCCAGG AGGGGCAGTC TCCACCTCTC AGCCTGTGGA ACTGCCCCCT CCTAGCAGCC 240 TGGCCCTGCT GAACTCTGTG GTGTATGGGC CTGAGCGGAC CTCAGCAGCC ATGCTGTCCC 300 AGCAGGTGGC CTCAGTAAAG TGGCCCAACT CTGTGATGGC TCCAGGGCGG GGCCCGGAGC 360 GTGGAGGAGG TGGGGGTGTC AGTGACAGCA GCTGGCAGCA GCAGCCAGGC CAGCCTCCAC 420 CCCATTCAAC ATGGAACTGC CACAGTCTGT CCCTCTACAG TGCAACCAAG GGGAGCCCGC 480 ATCCTGGAGT GGGAGTCCCG ACTTACTATA ACCACCCTGA GGCACTGAAG CGGGAGAAAG 540 CGGGGGGCCC ACAGCTGGAC CGCTATGTGC GACCAATGAT GCCACAGAAG GTGCAGCTGG 600 AGGTAGGGCG GCCCCAGGCA CCCCTGAATT CTTTCCACGC AGCCAAGAAA CCCCCAAACC 660 AGTCACTGCC CCTGCAACCC TTCCAGCTGG CATTCGGCCA CCAGGTGAAC CGGCAGGTCT 720 TCCGGCAGGG CCCACCGCCC CCAAACCCGG TGGCTGCCTT CCCTCCACAG AAGCAGCAGC 780 AGCAGCAGCA ACCACAGCAG CAGCAGCAGC AGCAGCAGGC AGCCCTACCC CAGATGCCGC 840 TCTTTGAGAA CTTCTATTCC ATGCCACAGC AACCCTCGCA GCAACCCCAG GACTTTGGCC 900 TGCAGCCAGC TGGGCCACTG GGACAGTCCC ACCTGGCTCA CCACAGCATG GCACCCTACC 960 CCTTCCCCCC CAACCCAGAT ATGAACCCAG AACTGCGCAA GGCCCTTCTG CAGGACTCAG 1020 CCCCGCAGCC AGCGCTACCT CAGGTCCAGA TCCCCTTCCC CCGCCGCTCC CGCCGCCTCT 1080 CTAAGGAGGG TATCCTGCCT CCCAGCGCCC TGGATGGGGC TGGCACCCAG CCTGGGCAGG 1140 AGGCCACTGG CAACCTGTTC CTACATCACT GGCCCCTGCA GCAGCCGCCA CCTGGCTCCC 1200 TGGGGCAGCC CCATCCTGAA GCTCTGGGAT TCCCGCTGGA GCTGAGGGAG TCGCAGCTAC 1260 TGCCTGATGG GGAGAGACTA GCACCCAATG GCCGGGAGCG AGAGGCTCCT GCCATGGGCA 1320 GCGAGGAGGG CATGAGGGCA GTGAGCACAG GGGACTGTGG GCAGGTGCTA CGGGGCGGAG 1380 TGATCCAGAG CACGCGACGG AGGCGCCGGG CATCCCAGGA GGCCAATTTG CTGACCCTGG 1440 CCCAGAAGGC TGTGGAGCTG GCCTCACTGC AGAATGCAAA GGATGGCAGT GGTTCTGAAG 1500 AGAAGCGGAA AAGTGTATTG GCCTCAACTA CCAAGTGTGG GGTGGAGTTT TCTGAGCCTT 1560 CCTTAGCCAC CAAGCGAGCA CGAGAAGACA GTGGGATGGT ACCCCTCATC ATCCCAGTGT 1620 CTGTGCCTGT GCGAACTGTG GACCCAACTG AGGCAGCCCA GGCTGGAGGT CTTGATGAGG 1680 ACGGGAAGGG TCTTGAACAG AACCCTGCTG AGCACAAGCC ATCAGTCATC GTCACCCGCA 1740 GGCGGTCCAC CCGAATCCCC GGGACAGATG CTCAAGCTCA GGCGGAGGAC ATGAATGTCA 1800 AGTTGGAGGG GGAGCCTTCC GTGCGGAAAC CAAAGCAGCG GCCCAGGCCC GAGCCCCTCA 1860 TCATCCCCAC CAAGGCGGGC ACTTTCATCG CCCCTCCCGT CTACTCCAAC ATCACCCCAT 1920 ACCAGAGCCA CCTGCGCTCT CCCGTGCGCC TAGCTGACCA CCCCTCTGAG CGGAGCTTTG 1980 AGCTACCTCC CTACACGCCG CCCCCCATCC TCAGCCCTGT GCGGGAAGGC TCTGGCCTCT 2040 ACTTCAATGC CATCATATCA ACCAGCACCA TCCCTGCCCC TCCTCCCATC ACGCCTAAGA 2100 GTGCCCATCG CACGCTGCTC CGGACTAACA GTGCTGAAGT AACCCCGCCT GTCCTCTCTG 2160 TGATGGGGGA GGCCACCCCA GTGAGCATCG AGCCACGGAT CAACGTGGGC TCCCGGTTCC 2220 AGGCAGAAAT CCCCTTGATG AGGGACCGTG CCCTGGCAGC TGCAGATCCC CACAAGGCTG 2280 ACTTGGTGTG GCAGCCATGG GAGGACCTAG AGAGCAGCCG GGAGAAGCAG AGGCAAGTGG 2340 AAGACCTGCT GACAGCCGCC TGCTCCAGCA TTTTCCCTGG TGCTGGCACC AACCAGGAGC 2400 TGGCCCTGCA CTGTCTGCAC GAATCCAGAG GAGACATCCT GGAAACGCTG AATAAGCTGC 2460 TGCTGAAGAA GCCCCTGCGG CCCCACAACC ATCCGCTGGC AACTTATCAC TACACAGGCT 2520 CTGACCAGTG GAAGATGGCC GAGAGGAAGC TGTTCAACAA AGGCATTGCC ATCTACAAGA 2580 AGGATTTCTT CCTGGTGCAG AAGCTGATCC AGACCAAGAC CGTGGCCCAG TGCGTGGAGT 2640 TCTACTACAC CTACAAGAAG CAGGTGAAAA TCGGCCGCAA TGGGACTCTA ACCTTTGGGG 2700 ATGTGGATAC GAGCGATGAG AAGTCGGCCC AGGAAGAGGT TGAAGTGGAT ATTAAGACTT 2760 CCCAAAAGTT CCCAAGGGTG CCTCTTCCCA GAAGAGAGTC CCCAAGTGAA GAGAGGCTGG 2820 AGCCCAAGAG GGAGGTGAAG GAGCCCAGGA AGGAGGGGGA GGAGGAGGTG CCAGAGATCC 2880 AAGAGAAGGA GGAGCAGGAA GAGGGGCGAG AGCGCAGCAG GCGGGCAGCG GCAGTCAAAG 2940 CCACGCAGAC ACTACAGGCC AATGAGTCGG CCAGTGACAT CCTCATCCTC CGGAGCCACG 3000 AGTCCAACGC CCCTGGGTCT GCCGGTGGCC AGGCCTCGGA GAAGCCAAGG GAAGGGACAG 3060 GGAAGTCACG AAGGGCACTA CCTTTTTCAG AAAAAAAAAA AAAAAAACAA AAAGCTT 3117 3306 base pairs nucleic acid double linear Genomic DNA 7 GAATTCGGCA CGAGGTCAGT TTCCTGTGGA ACACAGAGGC TGCCTGTCCC ATTCAGACAA 60 CGACGGATAC AGACCAGGCT TGCTCTATAA GGGATCCCAA CAGTGGATTT GTGTTTAATC 120 TTAATCCGCT AAACAGTTCG CAAGGATATA ACGTCTCTGG CATTGGGAAG ATTTTTATGT 180 TTAATGTCTG CGGCACAATG CCTGTCTGTG GGACCATCCT GGGAAAACCT GCTTCTGGCT 240 GTGAGGCAGA AACCCAAACT GAAGAGCTCA AGAATTGGAA GCCAGCAAGG CCAGTCGGAA 300 TTGAGAAAAG CCTCCAGCTG TCCACAGAGG GCTTCATCAC TCTGACCTAC AAAGGGCCTC 360 TCTCTGCCAA AGGTACCGCT GATGCTTTTA TCGTCCGCTT TGTTTGCAAT GATGATGTTT 420 ACTCAGGGCC CCTCAAATTC CTGCATCAAG ATATCGACTC TGGGCAAGGG ATCCGAAACA 480 CTTACTTTGA GTTTGAAACC GCGTTGGCCT GTGTTCCTTC TCCAGTGGAC TGCCAAGTCA 540 CCGACCTGGC TGGAAATGAG TACGACCTGA CTGGCCTAAG CACAGTCAGG AAACCTTGGA 600 CGGCTGTTGA CACCTCTGTC GATGGGAGAA AGAGGACTTT CTATTTGAGC GTTTGCAATC 660 CTCTCCCTTA CATTCCTGGA TGCCAGGGCA GCGCAGTGGG GTCTTGCTTA GTGTCAGAAG 720 GCAATAGCTG GAATCTGGGT GTGGTGCAGA TGAGTCCCCA AGCCGCGGCG AATGGATCTT 780 TGAGCATCAT GTATGTCAAC GGTGACAAGT GTGGGAACCA GCGCTTCTCC ACCAGGATCA 840 CGTTTGAGTG TGCTCAGATA TCGGGCTCAC CAGCATTTCA GCTTCAGGAT GGTTGTGAGT 900 ACGTGTTTAT CTGGAGAACT GTGGAAGCCT GTCCCGTTGT CAGAGTGGAA GGGGACAACT 960 GTGAGGTGAA AGACCCAAGG CATGGCAACT TGTATGACCT GAAGCCCCTG GGCCTCAACG 1020 ACACCATCGT GAGCGCTGGC GAATACACTT ATTACTTCCG GGTCTGTGGG AAGCTTTCCT 1080 CAGACGTCTG CCCCACAAGT GACAAGTCCA AGGTGGTCTC CTCATGTCAG GAAAAGCGGG 1140 AACCGCAGGG ATTTCACAAA GTGGCAGGTC TCCTGACTCA GAAGCTAACT TATGAAAATG 1200 GCTTGTTAAA AATGAACTTC ACGGGGGGGG ACACTTGCCA TAAGGTTTAT CAGCGCTCCA 1260 CAGCCATCTT CTTCTACTGT GACCGCGGCA CCCAGCGGCC AGTATTTCTA AAGGAGACTT 1320 CAGATTGTTC CTACTTGTTT GAGTGGCGAA CGCAGTATGC CTGCCCACCT TTCGATCTGA 1380 CTGAATGTTC ATTCAAAGAT GGGGCTGGCA ACTCCTTCGA CCTCTCGTCC CTGTCAAGGT 1440 ACAGTGACAA CTGGGAAGCC ATCACTGGGA CGGGGGACCC GGAGCACTAC CTCATCAATG 1500 TCTGCAAGTC TCTGGCCCCG CAGGCTGGCA CTGAGCCGTG CCCTCCAGAA GCAGCCGCGT 1560 GTCTGCTGGG TGGCTCCAAG CCCGTGAACC TCGGCAGGGT AAGGGACGGA CCTCAGTGGA 1620 GAGATGGCAT AATTGTCCTG AAATACGTTG ATGGCGACTT ATGTCCAGAT GGGATTCGGA 1680 AAAAGTCAAC CACCATCCGA TTCACCTGCA GCGAGAGCCA AGTGAACTCC AGGCCCATGT 1740 TCATCAGCGC CGTGGAGGAC TGTGAGTACA CCTTTGCCTG GCCCACAGCC ACAGCCTGTC 1800 CCATGAAGAG CAACGAGCAT GATGACTGCC AGGTCACCAA CCCAAGCACA GGACACCTGT 1860 TTGATCTGAG CTCCTTAAGT GGCAGGGCGG GATTCACAGC TGCTTACAGC GAGAAGGGGT 1920 TGGTTTACAT GAGCATCTGT GGGGAGAATG AAAACTGCCC TCCTGGCGTG GGGGCCTGCT 1980 TTGGACAGAC CAGGATTAGC GTGGGCAAGG CCAACAAGAG GCTGAGATAC GTGGACCAGG 2040 TCCTGCAGCT GGTGTACAAG GATGGGTCCC CTTGTCCCTC CAAATCCGGC CTGAGCTATA 2100 AGAGTGTGAT CAGTTTCGTG TGCAGGCCTG AGGCCGGGCC AACCAATAGG CCCATGCTCA 2160 TCTCCCTGGA CAAGCAGACA TGCACTCTCT TCTTCTCCTG GCACACGCCG CTGGCCTGCG 2220 AGCAAGCGAC CGAATGTTCC GTGAGGAATG GAAGCTCTAT TGTTGACTTG TCTCCCCTTA 2280 TTCATCGCAC TGGTGGTTAT GAGGCTTATG ATGAGAGTGA GGATGATGCC TCCGATACCA 2340 ACCCTGATTT CTACATCAAT ATTTGTCAGC CACTAAATCC CATGCACGGA GTGCCCTGTC 2400 CTGCCGGAGC CGCTGTGTGC AAAGTTCCTA TTGATGGTCC CCCCATAGAT ATCGGCCGGG 2460 TAGCAGGACC ACCAATACTC AATCCAATAG CAAATGAGAT TTACTTGAAT TTTGAAAGCA 2520 GTACTCCTTG CCAGGAATTC AGTTGTAAAT AAAATTGAAC CTGCTCAACA GCTGAGGGAG 2580 ACTAGAAATG ATGGGTCCAT ATCCTGGTGC ATTGTCATAC AATTCAAACA ATGGTGCAGC 2640 TACCAGCTTG TAATTTTTAG GGACTGCAAA CAAGGCTTTT TCTTGAAGCT GAACCAGAAA 2700 CAACTTCTTA TGTTCCTTAG GCTTTGTAAT ATGTGCAGGA ATATATGGAT ACTGAGGAGG 2760 TTCAAAATTT GGTCTCCACC AGTTACCAAT GCAATCGTCA ATGACCCAGT CTTGCAAAAC 2820 TCCATCCTGA CGACCCAGTA TCTCTGTCAT TAAGCGTTTT AGTCCTTCAA CTTCATCTTC 2880 TCCTGGGTTA AGTTCACCAC CAGGTAGTTT GAAGAAAGTT GTTCCCAGCT GCAGCAGTAA 2940 CACATGGGGT AGCCGGTGCT CATGTACAAT CAGAACCCCT TCTACAGTCC TCCTCATTCC 3000 AATTTTATCA AATTCTTCCC TCATGCGCTG AAATCTGGCT GCAACAGAGC TGTCCTTCTC 3060 GTAGAGGGGC TCTTTTGTAC CAAAAGTATA ATTGGTAAGA GGGTACAGGT TGATGGTGCG 3120 CTCCAGGGTG AGGGGCTTCG TCTGCTGGAT GTACTTGTTG CCGAACTGAG TGACCCCCCG 3180 GGGCCAGCCG GTCTGCGAGC GATTGGGCGG TACCACAGAC ATGCTGGCGA GCTCCGGCGC 3240 TGACGGCGAG CAGAAAGTGG CAGGCAGGGT AGACTTTCCC CGTGCGGGAA GCCTCGTGCC 3300 GAATTC 3306 4218 base pairs nucleic acid double linear Genomic DNA 8 GAATTCGGCA CGAGAATGGA TCAACCTCAA CAACACGTTA AAGCTAGACG AAAGAAGTAA 60 TACACAGTGT ATGAGTCTCA CATGAAATAC CCGGATGTAA ATCCAAAGAA ACAGGAAGCA 120 GATTGGTGGT TGCCAGGGAC AAGGGCGGTG GGAGGAGAAA ATGGAGAGTA ACGGGACTTT 180 ACTTTTGGAG TGATGAGAAT GTTTTGGAGC TAGATAGAAG TGGTGGTTGT ACACCATTGT 240 GGATGTACTA CCACTTAATT GTTCACTTAA AAAGTTAATT TATGTGAATT GCATCTTAAT 300 TAAAAACAAG GATAACATTC CAACTCCTGG ACATTATCCT TCCTTTCCAT TTGATGTCAG 360 GCCCGTGTTA GAATTCTCAT CCGGTTTGGT CACTGCACTT AAGATGTGGA GAAATTAGGA 420 CGCACAGTTA AGAGGAAGGA TAACACTGAT TAAGGTAGTG CTTTTCTAGG TTTCCCCTAA 480 ACAATTTAAC AGATGGATAG TGGCACCACT TACGAGATGG AAAAACCAGC GGAAGGAAGA 540 TTTGGGGGAG AAGTTAAGTT TGTCTTGGGC CTGTGTTTTG CAACCTGAGT GTAAAAGACA 600 TATGTTAAGT CTTCAGTGGC GAAACACTAA AACTAGAAAT GGATCAGAAT TTTATCTTTG 660 GATGTGACTT CTCAAGGATG GTCTTGTCAC TTCAGTGCCT GGTCAAATGA CAAGATGGGC 720 AATCTTTTCC TGAAGGTCCA AGCACCTGAA CGTGGCAGGG TGACCCGATT CCGATTTGCT 780 TAGAACAATC CTAGTTCATG CCTATTGTCC CTCATGTAAT TAATATCACT CTCAAAATGT 840 CTCATTTTGT GCAATAAATT CTGCAACGTG ATGGCGCGAC TCTCGCGGCC CGAGCGGCCG 900 GACCTTGTCT TCGAGGAAGA GGACCTCCCC TATGAGGAGG AAATCATGCG GAACCAATTC 960 TCTGTCAAAT GCTGGCTTCA CTACATCGAG TTCAAACAGG GCGCCCCGAA GCCCAGGCTC 1020 AATCAGCTAT ACGAGCGGGC ACTCAAGCTG CTGCCCTGCA GCTACAAACT CTGGTACCGA 1080 TACCTGAAGG CGCGTCGGGC ACAGGTGAAG CATCGCTGTG TGACCGACCC TGCCTATGAA 1140 GATGTCAACA ACTGTCATGA GAGGGCCTTT GTGTTCATGC ACAAGATGCC TCGTCTGTGG 1200 CTAGATTACT GCCAGTTCCT CATGGACCAG GGGCGCGTCA CACACACCCG CCGCACCTTC 1260 GACCGTGCCC TCCGGGCACT GCCCATCACG CAGCACTCTC GAATTTGGCC CCTGTATCTG 1320 CGCTTCCTGC GCTCACACCC ACTGCCTGAG ACAGCTGTGC GAGGCTATCG GCGCTTCCTC 1380 AAGCTGAGTC CTGAGAGTGC AGAGGAGTAC ATTGAGTACC TCAAGTCAAG TGACCGGCTG 1440 GATGAGGCCG CCCAGCGCCT GGCCACCGTG GTGAACGACG AGCGTTTCGT GTCTAAGGCC 1500 GGCAAGTCCA ACTACCAGCT GTGGCACGAG CTGTGCGACC TCATCTCCCA GAATCCGGAC 1560 AAGGTACAGT CCCTCAATGT GGACGCCATC ATCCGCGGGG GCCTCACCCG CTTCACCGAC 1620 CAGCTGGGCA AGCTCTGGTG TTCTCTCGCC GACTACTACA TCCGCAGCGG CCATTTCGAG 1680 AAGGCTCGGG ACGTGTACGA GGAGGCCATC CGGACAGTGA TGACCGTGCG GGACTTCACA 1740 CAGGTGTTTG ACAGCTACGC CCAGTTCGAG GAGAGCATGA TCGCTGCAAA GATGGAGACC 1800 GCCTCGGAGC TGGGGCGCGA GGAGGAGGAT GATGTGGACC TGGAGCTGCG CCTGGCCCGC 1860 TTCGAGCAGC TCATCAGCCG GCGGCCCCTG CTCCTCAACA GCGTCTTGCT GCGCCAAAAC 1920 CCACACCACG TGCACGAGTG GCACAAGCGT GTCGCCCTGC ACCAGGGCCG CCCCCGGGAG 1980 ATCATCAACA CCTACACAGA GGCTGTGCAG ACGGTGGACC CCTTCAAGGC CACAGGCAAG 2040 CCCCACACTC TGTGGGTGGC GTTTGCCAAG TTTTATGAGG ACAACGGACA GCTGGACGAT 2100 GCCCGTGTCA TCCTGGAGAA GGCCACCAAG GTGAACTTCA AGCAGGTGGA TGACCTGGCA 2160 AGCGTGTGGT GTCAGTGCGG AGAGCTGGAG CTCCGACACG AGAACTACGA TGAGGCCTTG 2220 CGGCTGCTGC GAAAGGCCAC GGCGCTGCCT GCCCGCCGGG CCGAGTACTT TGATGGTTCA 2280 GAGCCCGTGC AGAACCGCGT GTACAAGTCA CTGAAGGTCT GGTCCATGCT CGCCGACCTG 2340 GAGGAGAGCC TCGGCACCTT CCAGTCCACC AAGGCCGTGT ACGACCGCAT CCTGGACCTG 2400 CGTATCGCAA CACCCCAGAT CGTCATCAAC TATGCCATGT TCCTGGAGGA GCACAAGTAC 2460 TTCGAGGAGA GCTTCAAGGC GTACGAGCGC GGCATCTCGC TGTTCAAGTG GCCCAACGTG 2520 TCCGACATCT GGAGCACCTA CCTGACCAAA TTCATTGCCC GCTATGGGGG CCGCAAGCTG 2580 GAGCGGGCAC GGGACCTGTT TGAACAGGCT CTGGACGGCT GCCCCCCAAA ATATGCCAAG 2640 ACCTTGTACC TGCTGTACGC ACAGCTGGAG GAGGAGTGGG GCCTGGCCCG GCATGCCATG 2700 GCCGTGTACG AGCGTGCCAC CAGGGCCGTG GAGCCCGCCC AGCAGTATGA CATGTTCAAC 2760 ATCTACATCA AGCGGGCGGC CGAGATCTAT GGGGTCACCC ACACCCGCGG CATCTACCAG 2820 AAGGCCATTG AGGTGCTGTC GGACGAGCAC GCGCGTGAGA TGTGCCTGCG GTTTGCAGAC 2880 ATGGAGTGCA AGCTCGGGGA GATTGACCGC GCCCGGGCCA TCTACAGCTT CTGCTCCCAG 2940 ATCTGTGACC CCCGGACGAC CGGCGCGTTC TGGCAGACGT GGAAGGACTT TGAGGTCCGG 3000 CATGGCAATG AGGACACCAT CAAGGAAATG CTGCGTATCC GGCGCAGCGT GCAGGCCACG 3060 TACAACACGC AGGTCAACTT CATGGCCTCG CAGATGCTCA AGGTCTCGGG CAGTGCCACG 3120 GGCACCGTGT CTGACCTGGC CCCTGGGCAG AGTGGCATGG ACGACATGAA GCTGCTGGAA 3180 CAGCGGGCAG AGCAGCTGGC GGCTGAGGCG GAGCGTGACC AGCCCTTGCG CGCCCAGAGC 3240 AAGATCCTGT TCGTGAGGAG TGACGCCTCC CGGGAGGAGC TGGCAGAGCT GGCACAGCAG 3300 GTCAACCCCG AGGAGATCCA GCTGGGCGAG GACGAGGACG AGGACGAGAT GGACCTGGAG 3360 CCCAACGAGG TTCGGCTGGA GCAGCAGAGC GTGCCAGCCG CAGTGTTTGG GAGCCTGAAG 3420 GAAGACTGAC CCGTCCCCTC GTGCCGAATT CGGCACGAGC AAGACCAGCC CCCAGATCAT 3480 TTGCCTCAAA GGTTTTCCCT CGAAGTCACA AATGTTTCAA GGAATCTCAA ATTTTACAAA 3540 GTTTGAAGTG TGGGCATTGG TGGCCTGTGG CTGTGTCCTC TCTCTGTAGC TGTTTTCTCC 3600 CTACATCCCT GAAAGGAAGT TGAGCCTGCT CCTCCATCCG CAGACCTCCC TTTCCAGCGC 3660 CCAGGGCATG GGGTGCTGTG AGGGCAGCAT GCTAGGTGTG ACCGTGCTCC TGGCCTCCAG 3720 GCCCGTGTCC CTCTGTCCTC TAGCCCACTA AGGCCCTGGC CCATTTGTGC TAAACAGGCA 3780 GTCGGACCTA GAAAGAGCAG ACAATCTCTC TGGGTCACCA GTCTGGCTAG GAGCTGGTCT 3840 CCTGACTGGG ATCCAGGCCT TCTCCCCTGC CCATGTGAAT TCCCAGGGGC AGAGCCTGAA 3900 ATGTTGAACA CAGCACTGGC CAAAGAGATG TCACCGTGGG AACCGAGGCT CTCTTCTCCT 3960 CCTGCCTGCT TTCGTGGGTT CAGAGTAGCT GAGGCTTGTC TGAGAGGAGT TGGAGTGCTG 4020 GTTTTCACCC TGGTTGGTGT GCTTTGCTTT GAGGGCACTT AGAAAGCCCA GCCCAGCCCT 4080 TGCTCCTGCC CTGCACACAG CGGAGCGACT TTTCTAGGTA TGCTCTTGAT TTCTGCAGAA 4140 GCAGCAGGTG GCATGGAGCC AAGAGGAAGT GTGACTGAAA CTGTCCACTC ATAGCCCGGC 4200 TGCCGTATTG AGAGGGCT 4218 1187 base pairs nucleic acid double linear Genomic DNA 9 GAGCTCGCGC GCCTGCAGGT CGACACTAGT GGATCCAAAG AATTCGGCAC GAGGGAAACT 60 CAACGGTGTA CGAGTGGAGG ACAGGGACAG AGCCCTCTGT GGTGGAACGA CCCCACCTCG 120 AGGAGCTTCC TGAGCAGGTG GCAGAAGATG CGATTGACTG GGGCGACTTT GGGGTAGAGG 180 CAGTGTCTGA GGGGACTGAC TCTGGCATCT CTGCCGAGGC TGCTGGAATC GACTGGGGCA 240 TCTTCCCGGA ATCAGATTCA AAGGATCCTG GAGGTGATGG GATAGACTGG GGAGACGATG 300 CTGTTGCTTT GCAGATCACA GTGCTGGAAG CAGGAACCCA GGCTCCAGAA GGTGTTGCCA 360 GGGGCCCAGA TGCCCTGACA CTGCTTGAAT ACACTGAGAC CCGGAATCAG TTCCTTGATG 420 AGCTCATGGA GCTTGAGATC TTCTTAGCCC AGAGAGCAGT GGAGTTGAGT GAGGAGGCAG 480 ATGTCCTGTC TGTGAGCCAG TTCCAGCTGG CTCCAGCCAT CCTGCAGGGC CAGACCAAAG 540 AGAAGATGGT TACCATGGTG TCAGTGCTGG AGGATCTGAT TGGCAAGCTT ACCAGTCTTC 600 AGCTGCAACA CCTGTTTATG ATCCTGGCCT CACCAAGGTA TGTGGACCGA GTGACTGAAT 660 TCCTCCAGCA AAAGCTGAAG CAGTCCCAGC TGCTGGCTTT GAAGAAAGAG CTGATGGTGC 720 AGAAGCAGCA GGAGGCACTT GAGGAGCAGG CGGCTCTGGA GCCTAAGCTG GACCTGCTAC 780 TGGAGAAGAC CAAGGAGCTG CAGAAGCTGA TTGAAGCTGA CATCTCCAAG AGGTACAGCG 840 GGCGCCCTGT GAACCTGATG GGAACCTCTC TGTGACACCC TCCGTGTTCT TGCCTGCCCA 900 TCTTCTCCGC TTTTGGGATG AAGATGATAG CCAGGGCTGT TGTTTTGGGG CCCTTCAAGG 960 CAAAAGACCA GGCTGACTGG AAGATGGAAA GCCACAGGAA GGAAGCGGCA CCTGATGGTG 1020 ATCTTGGCAC TCTCCATGTT CTCTACAAGA AGCTGTGGTG ATTGGCCCTG TGGTCTATCA 1080 GGCGAAAACC ACAGATTCTC CTTCTAGTTA GTATAGCGCA AAAAGCTTCT CGAGAGTACT 1140 TCTAGAGCGG CCGCGGGCCC ATCGATTTTC CACCCGGGTG GGGTACC 1187 3306 base pairs nucleic acid double linear Genomic DNA 10 CCCTCACTAA AGGGAACAAA AGCTGGAGCT CGCGCGCCTG CAGGTCGACA CTAGTGGATC 60 GAAAGTTCGT TACGCCAAGC TCGAAATTAA CTCTGGGCTG ACCCATAAAC ATTTGTCTGA 120 TCTAGGATAT AGTTGCGTTT CTTGCGGGCA GCAATCTGGA TGAGGCGGTT GAGGCACTGG 180 GTGGCCTGCT GGATCAGGAC ATCCCAGCGG CCAGCATAGT TCCGCTGCCG GCGTAGGCCC 240 ATCACCCGCA TCTTATCCAT GATGGCATTG GTACCCAGGA TGTTGTACTT CTTGGAAGGG 300 TTGGAGGCTG CATGTTTGAT GGCCCATGTG GTCTTGCCAG CAGCAGGCAG GCCCACCATC 360 ATCAGAATCT CACATTCTGC CTTGCTCTTT GGTCCAACGG TGCCCCGGAT ACGCTCACTA 420 AGGGGAAGGT GCTGGATGAA GGTAAACCCC GGGAGGACAG AACAGTAGGG CTCTGCTCTC 480 TGTCCGAAGT TGAACTCCAC TGCGCAATTC TTCACCAGGA CATGAGGATA GAGGGCCTGA 540 CCCCCCAAGG CTTCCTTCTG GATTCGGAAA GCAATGCCCA TCCACTTTCC ATTCTTGGTA 600 AAAGACAGTT CCACGTCATT TCCACATTCA AAATCCGCAA AGCAGCCAAT CACCGGAGAG 660 CTCTGCGGTG CTAGGAGAGC GGCTGGGCCC GCAGACTGGG GGGAAAGCTC CGCAGCCGCA 720 GTGGGCCCCA GGATCAGGCC CCGCGTGGCC TGGAGAAGCC CAGTCTGGGC TGGAGCGGGA 780 GCTGGACAGT GTGGCCTTGC GTTCGCCCCC GGGAGCGCTG CGAGTGTCGC GGCCTCGGGT 840 GGATTTGCTG AGCACCAATA CCTCACGGTT GCCAACCTGG GGTTTTAGCT CCCTTGGTTT 900 TAATCCCCTA GGGGCGGGTG GGGGCACGGG AGGAAGGATG GGCCAGCTGG GTGCAATCCT 960 GCTGTAAGCC AGCCATTCCT TGATTTCTTA GAATTAACTA AACGGTCGCG CCGGAGGCCG 1020 CGGGGGCCGG AGCGGAGCAG CCGCGGCTGA GGTTCCCGAG TCGGCCGCTC GGGGCTGCGC 1080 TCCGCCGCCG GGACCCCGGC CTCTGGCCGC GCCGGCTCCG GCCTCCGGGG GGGCCGGGGC 1140 CGCCGGGACA TGGTGCCAGT CGCACCCCTT CCCCGCCGCC GCTGAGCTCG CCGGCCGCGC 1200 CCGGGCTGGG ACGTCCGAGC GGGAAGATGT TTTCCGCCCT GAAGAAGCTG GTGGGGTCGG 1260 ACCAGGCCCC GGGCCGGGAC AAGAACATCC CCGCCGGGCT GCAGTCCATG AACCAGGCGT 1320 TGCAGAGGCG CTTCGCCAAG GGGGTGCAGT ACAACATGAA GATAGTGATC CGGGGAGACA 1380 GGAACACGGG CAAGACAGCG CTGTGGCACC GCCTGCAGGG CCGGCCGTTC GTGGAGGAGT 1440 ACATCCCCAC ACAGGAGATC CAGGTCACCA GCATCCACTG GAGCTACAAG ACCACGGATG 1500 ACATCGTGAA GGTTGAAGTC TGGGATGTAG TAGACAAAGG AAAATGCAAA AAGCGAGGCG 1560 ACGGCTTAAA GATGGAGAAC GACCCCCAGG AGNCGGAGTC TGAAATGGCC CTGGATGCTG 1620 AGTTCCTGGA CGTGTACAAG AACTGCAACG GGGTGGTCAT GATGTTCGAC ATTACCAAGC 1680 AGTGGACCTT CAATTACATT CTCCGGGAGC TTCCAAAAGT GCCCACCCAC GTGCCAGTGT 1740 GCGTGCTGGG GAACTACCGG GACATGGGCG AGCACCGAGT CATCCTGCCG GACGACGTGC 1800 GTGACTTCAT CGACAACCTG GACAGACCTC CAGGTTCCTC CTACTTCCGC TATGCTGAGT 1860 CTTCCATGAA GAACAGCTTC GGCCTAAAGT ACCTTCATAA GTTCTTCAAT ATCCCATTTT 1920 TGCAGCTTCA GAGGGAGACG CTGTTGCGGC AGCTGGAGAC GAACCAGCTG GACATGGACG 1980 CCACGCTGGA GGAGCTGTCG GTGCAGCAGG AGACGGAGGA CCAGAACTAC GGCATCTTCC 2040 TGGAGATGAT GGAGGCTCGC AGCCGTGGCC ATGCGTCCCC ACTGGCGGCC AACGGGCAGA 2100 GCCCATCCCC GGGCTCCCAG TCACCAGTCC TGCCTGCACC CGCTGTGTCC ACGGGGAGCT 2160 CCAGCCCCGG CACACCCCAG CCCGCCCCAC AGCTGCCCCT CAATGCTGCC CCACCATCCT 2220 CTGTGCCCCC TGTACCACCC TCAGAGGCCC TGCCCCCACC TGCGTGCCCC TCAGCCCCCG 2280 CCCCACGGCG CAGCATCATC TCTAGGCTGT TTGGGACGTC ACCTGCCACC GAGGCAGCCC 2340 CTCCACCTCC AGAGCCAGTC CCGGCCGCAC AGGGCCCAGC AACGGTCCAG AGTGTGGAGG 2400 ACTTTGTTCC TGACGACCGC CTGGACCGCA GCTTCCTGGA AGACACAACC CCCGCCAGGG 2460 ACGAGAAGAA GGTGGGGGCC AAGGCTGCCC AGCAGGACAG TGACAGTGAT GGGGAGGCCC 2520 TGGGCGGCAA CCCGATGGTG GCAGGGTTCC AGGACGATGT GGACCTCGAA GACCAGCCAC 2580 GTGGGAGTCC CCCGCTGCCT GCAGGCCCCG TCCCCAGTCA AGACATCACT CTTTCGAGTG 2640 AGGAGGAAGC AGAAGTGGCA GCTCCCACAA AAGGCCCTGC CCCAGCTCCC CAGCAGTGCT 2700 CAGAGCCAGA GACCAAGTGG TCCTCCATAC CAGCTTCGAA GCCACGGAGG GGGACAGCTC 2760 CCACGAGGAC CGCAGCACCC CCCTGGCCAG GCGGTGTCTC TGTTCGCACA GGTCCGGAGA 2820 AGCGCAGCAG CACCAGGCCC CCTGCTGAGA TGGAGCCGGG GAAGGGTGAG CAGGCCTCCT 2880 CGTCGGAGAG TGACCCCGAG GGACCCATTG CTGCACAAAT GCTGTCCTTC GTCATGGATG 2940 ACCCCGACTT TGAGAGCGAG GGATCAGACA CACAGCGCAG GGCGGATGAC TTTCCCGTGC 3000 GAGATGACCC CTCCGATGTG ACTGACGAGG ATGAGGGCCC TGCCGAGCCG CCCCCACCCC 3060 CCAAGCTCCC TCTCCCCGCC TTCAGACTGA AGAATGACTC GGACCTCTTC GGGCTGGGGC 3120 TGGAGGAGGC CGGACCCAAG GAGAGCAGTG AGGAAGGTAA GGAGGGCAAA ACCCCCTCTA 3180 AGGAGAAGAA AAAAAAAACA AAAAGCTTCT CGAGAGTACT TCTAGAGCGG CCGCGGGCCC 3240 ATCGATTTTC CACCCGGGTG GGGTACCAGG TAAGTGTACC CAATTCGCCC TATAGTGAGT 3300 CGTATT 3306 20 base pairs nucleic acid single linear 11 TGCGGGGCCA GAGTGGGCTG 20 20 base pairs nucleic acid single linear 12 GCAGTCCTGG CCTGCGGATG 20 20 base pairs nucleic acid single linear 13 GTCGACAGGA GAATTGGTTC 20 20 base pairs nucleic acid single linear 14 GCCTGGGTTC GGTGCGGGAC 20 20 base pairs nucleic acid single linear 15 TGGTCGGGTG TTTGTGAGTG 20 20 base pairs nucleic acid single linear 16 CCTCTTCCGT CTCCTCAGTG 20 20 base pairs nucleic acid single linear 17 GGATTGCTAG TCTCACAGAC 20 20 base pairs nucleic acid single linear 18 TTAAGGGTGG CTGAAGGGAC 20 20 base pairs nucleic acid single linear 19 ACCTTCCCTC CCTGTCACAG 20 20 base pairs nucleic acid single linear 20 TGGTCGGGTG TTTGTGAGTG 20 20 base pairs nucleic acid single linear 21 ACACCATTCC AGAAATTCAG 20 20 base pairs nucleic acid single linear 22 AAACTGCAGG TGGCTGAGTC 20 20 base pairs nucleic acid single linear 23 GTCCTAATGT TTTCAGGGAG 20 20 base pairs nucleic acid single linear 24 AAAACCTATG GTTACAATTC 20 20 base pairs nucleic acid single linear 25 TCCTAGACAT GGTTCAAGTG 20 20 base pairs nucleic acid single linear 26 GATATAATTA GTTCTCCATC 20 20 base pairs nucleic acid single linear 27 ATGCCTGTTC CAGGCTGCAC 20 20 base pairs nucleic acid single linear 28 GGACGGCGAC CTCCACCCAC 20 20 base pairs nucleic acid single linear 29 GGGCTCCTCC GACGCCTGAG 20 20 base pairs nucleic acid single linear 30 AGTCTAGCCC TGGCCTTGAC 20 20 base pairs nucleic acid single linear 31 GTCACTGGGG ACTCCGGCAG 20 20 base pairs nucleic acid single linear 32 CAGCTTTCCC TGGGCACATG 20 20 base pairs nucleic acid single linear 33 CACAGCTGTC TCAAGCCCAG 20 20 base pairs nucleic acid single linear 34 ACTGTTCCCC CTACATGATG 20 20 base pairs nucleic acid single linear 35 ATCATATCCT CTTGCTGGTC 20 20 base pairs nucleic acid single linear 36 GTTCCCAGAG CTTGTCTGTG 20 20 base pairs nucleic acid single linear 37 GTTTGGCAGA CTCATAGTTG 20 20 base pairs nucleic acid single linear 38 TAGCAGGGAG CCATGACCTG 20 20 base pairs nucleic acid single linear 39 CTTGGCGCCA GAAGCGAGAG 20 20 base pairs nucleic acid single linear 40 CCTCTCTCTC TCTCTCTCTC 20 20 base pairs nucleic acid single linear 41 TCCCCGCTGA TTCCGCCAAG 20 20 base pairs nucleic acid single linear 42 CTTTTTGAAT TCGGCACGAG 20 20 base pairs nucleic acid single linear 43 CCCCTGGTCC GCACCAGTTC 20 20 base pairs nucleic acid single linear 44 GAGAAGGGTC GGGGCGGCAG 20 20 base pairs nucleic acid single linear 45 AAATCACATC GCGTCAACAC 20 20 base pairs nucleic acid single linear 46 TAAGAGAGTC ATAGTTACTC 20 20 base pairs nucleic acid single linear 47 GCTCTAGAAG TACTCTCGAG 20 20 base pairs nucleic acid single linear 48 ACTCTGGCCA TCAGGAGATC 20 20 base pairs nucleic acid single linear 49 CAGGCGTTGT AGATGTTCTG 20 20 base pairs nucleic acid single linear 50 AGTGGCAGGC AGAAGTAATG 20 20 base pairs nucleic acid single linear 51 GGTTGGAGAA CTGGATGTAG 20 20 base pairs nucleic acid single linear 52 CTATTCAGAT GCAACGCCAG 20 20 base pairs nucleic acid single linear 53 CCATGGCACA CAGAGCAGAC 20 20 base pairs nucleic acid single linear 54 GCTACCATGC AGAGACACAG 20 20 base pairs nucleic acid single linear 55 CAGGCTGACA AGAAAATCAG 20 20 base pairs nucleic acid single linear 56 GGCACGCATA GAGGAGAGAC 20 20 base pairs nucleic acid single linear 57 TGGGTGATGC CTTTGCTGAC 20 20 base pairs nucleic acid single linear 58 AAAACAAGAT CAAGGTGATG 20 20 base pairs nucleic acid single linear 59 TTGCCCACAT TGCTATGGTG 20 20 base pairs nucleic acid single linear 60 GACCAAGATC AGAAGTAGAG 20 20 base pairs nucleic acid single linear 61 CCCCTGGGCC AATGATGTTG 20 19 base pairs nucleic acid single linear 62 TCTTCCCACC ATAGCAATG 19 20 base pairs nucleic acid single linear 63 TGGTCTTGGT GACCAATGTG 20 20 base pairs nucleic acid single linear 64 ACACCTCGGT GACCCCTGTG 20 20 base pairs nucleic acid single linear 65 TCTCCAAGTT CGGCACAGTG 20 20 base pairs nucleic acid single linear 66 ACATGGGCTG CACTCACGAC 20 20 base pairs nucleic acid single linear 67 GATCCTCTGA ACCTGCAGAG 20 20 base pairs nucleic acid single linear 68 GGAAATGAGG TGGGGCGATC 20 20 base pairs nucleic acid single linear 69 CTTTGCCTTG GACAAGGATG 20 20 base pairs nucleic acid single linear 70 GCACCTGCCA TTGGGGGTAG 20 20 base pairs nucleic acid single linear 71 GGTGGAAGCC ATTGACGGTG 20 20 base pairs nucleic acid single linear 72 TGCGTCTCTC GTCGCTGCTG 20 20 base pairs nucleic acid single linear 73 GCGGAAACTC TGTGGTGCTG 20 20 base pairs nucleic acid single linear 74 AGGATTGCCT TCCTCTACTG 20 20 base pairs nucleic acid single linear 75 TGTCTGTTTC ACCAGGGCAG 20 20 base pairs nucleic acid single linear 76 CCAGTGCCTC TATGCATGTC 20 20 base pairs nucleic acid single linear 77 AGGAAGCCCA CGCACACCAC 20 20 base pairs nucleic acid single linear 78 CCCTTTGTTC CCTGATCTTC 20 20 base pairs nucleic acid single linear 79 CGCTCGGGAT CCAGGTCATC 20 20 base pairs nucleic acid single linear 80 TCGAGGTTCA GAGCGTAGTG 20 20 base pairs nucleic acid single linear 81 TCTTGGATCT CTGGCACCTC 20 20 base pairs nucleic acid single linear 82 CCATCAGAGT GAAGGAGGAG 20 20 base pairs nucleic acid single linear 83 CCATCTTCCA CTGGTCAGAG 20 20 base pairs nucleic acid single linear 84 CTCCTTCTCT TGGATCTCTG 20 20 base pairs nucleic acid single linear 85 TTACTTCAGC ACTGTTAGTC 20 20 base pairs nucleic acid single linear 86 AGGGAGGTAG CTCAAAGCTC 20 20 base pairs nucleic acid single linear 87 TGGGTCCACA GTTCGCACAG 20 20 base pairs nucleic acid single linear 88 CAACTCTGTG ATGGCTCCAG 20 20 base pairs nucleic acid single linear 89 AGCAGGGTTC TGTTCAAGAC 20 20 base pairs nucleic acid single linear 90 CCATTGGGTG CTAGTCTCTC 20 20 base pairs nucleic acid single linear 91 CAGCCATGCT GTCCCAGCAG 20 20 base pairs nucleic acid single linear 92 CTGGACCTGA GGTAGCGCTG 20 20 base pairs nucleic acid single linear 93 ATAACCACCC TGAGGCACTG 20 20 base pairs nucleic acid single linear 94 CCTGCAGGTC GACACTAGTG 20 20 base pairs nucleic acid single linear 95 AATTGGAATG AGGAGGACTG 20 20 base pairs nucleic acid single linear 96 GCTCTAGAAG TACTCTCGAG 20 20 base pairs nucleic acid single linear 97 ATTGTATGAC AATGCACCAG 20 20 base pairs nucleic acid single linear 98 TCCACAGAGG GCTTCATCAC 20 20 base pairs nucleic acid single linear 99 CCTGACTGGC CTAAGCACAG 20 20 base pairs nucleic acid single linear 100 AAGCCTCATA ACCACCAGTG 20 20 base pairs nucleic acid single linear 101 TGTCAACGGT GACAAGTGTG 20 20 base pairs nucleic acid single linear 102 TTGTACACCA GCTGCAGGTC 20 20 base pairs nucleic acid single linear 103 GGGTGTGGTG CAGATGAGTC 20 20 base pairs nucleic acid single linear 104 ATCACACTCT TATAGCTCAG 20 20 base pairs nucleic acid single linear 105 GTGGGAAGCT TTCCTCAGAC 20 20 base pairs nucleic acid single linear 106 TGATGAACAT GGGCCTGGAG 20 20 base pairs nucleic acid single linear 107 CATTGTGGAT GTACTACCAC 20 20 base pairs nucleic acid single linear 108 TGTGTTTTGC AACCTGAGTG 20 20 base pairs nucleic acid single linear 109 ATAGTGGCAC CACTTACGAG 20 20 base pairs nucleic acid single linear 110 AATTCTGCAA CGTGATGGCG 20 20 base pairs nucleic acid single linear 111 CACAAGATGC CTCGTCTGTG 20 20 base pairs nucleic acid single linear 112 AATCCGGACA AGGTACAGTC 20 20 base pairs nucleic acid single linear 113 GCACGAGTGG CACAAGCGTG 20 20 base pairs nucleic acid single linear 114 GCAAGCGTGT GGTGTCAGTG 20 20 base pairs nucleic acid single linear 115 TGTTTGAACA GGCTCTGGAC 20 20 base pairs nucleic acid single linear 116 CGGCATGGCA ATGAGGACAC 20 20 base pairs nucleic acid single linear 117 AGGACGAGAT GGACCTCCAG 20 20 base pairs nucleic acid single linear 118 CCCTCTGTCC TCTAGCCCAC 20 20 base pairs nucleic acid single linear 119 TCTTGAGGGG ACTGACTCTG 20 20 base pairs nucleic acid single linear 120 TGAGTGAGGA GGCAGATGTC 20 20 base pairs nucleic acid single linear 121 TGGCTTTGAA GAAAGAGCTG 20 20 base pairs nucleic acid single linear 122 GCAAAAGACC AGGCTGACTG 20 20 base pairs nucleic acid single linear 123 TGCAGCTCCT TGGTCTTCTC 20 20 base pairs nucleic acid single linear 124 GATTCACAGT CCCAAGGCTC 20 20 base pairs nucleic acid single linear 125 ATCTGGATGA GGCGGTTGAG 20 20 base pairs nucleic acid single linear 126 GGTCACTCTC CGACGAGGAG 20 20 base pairs nucleic acid single linear 127 GGATCCAAAG TTCGTCTCTG 20 20 base pairs nucleic acid single linear 128 CGCTGTGTGT CTGATCCCTC 20 20 base pairs nucleic acid single linear 129 ATGAAGGTAA ACCCCGGGAG 20 20 base pairs nucleic acid single linear 130 TGGTCTCTGG CTCTGAGCAC 20 20 base pairs nucleic acid single linear 131 GCCTGGAGAA GCCCAGTCTG 20 20 base pairs nucleic acid single linear 132 CACACTCTGG ACCGTTGCTG 20 20 base pairs nucleic acid single linear 133 AAAGCTCCGC AGCCGCAGTG 20 20 base pairs nucleic acid single linear 134 TCTTCCAGGA AGCTGCGGTC 20 20 base pairs nucleic acid single linear 135 GATGGTGGGG CAGCATTGAG 20 20 base pairs nucleic acid single linear 136 GTCACCAGTG GTGCCTGCAG 20 20 base pairs nucleic acid single linear 137 ACCTCACGGT TGCCAACCTG 20 20 base pairs nucleic acid single linear 138 CGCAACAGCG TCTCCCTCTG 20 20 base pairs nucleic acid single linear 139 AGTACCTTCA TAAGTTCTTC 20 20 base pairs nucleic acid single linear 140 TCCCAGACTT CAACCTTCAC 20 20 base pairs nucleic acid single linear 141 AAACATCTTC CCGGTCGGAC 20 20 base pairs nucleic acid single linear 142 GCTGAGCACC TTTACCTCAC 20 20 base pairs nucleic acid single linear 143 GACGTCCGTC CGGGAAGATG 20 20 base pairs nucleic acid single linear 144 ACACAGGAGA TGCAGGTCAC 20 20 base pairs nucleic acid single linear 145 GAGTCTTCCA TGAAGAACAG 20 20 base pairs nucleic acid single linear 146 GCAGTGAGGA AGGTAAGGAG 20 4047 base pairs nucleic acid double linear Genomic DNA Coding Sequence 378...1799 147 GGATCCAAAG GACGCCCCCG CCGACAGGAG AATTGGTTCC CGGGCCCGCG GCGATGCCCC 60 CCCGGTAGCT CGGGCCCGTG GTCGGGTGTT TGTGAGTGTT TCTATGTGGG AGAAGGAGGA 120 GGAGGAGGAA GAAGAAGCAA CGATTTGTCT TCTCGGCTGG TCTCCCCCCG GCTCTACATG 180 TTCCCCGCAC TGAGGAGACG GAAGAGGAGC CGTAGCCGCC CCCCCTCCCG GCCCGGATTA 240 TAGTCTCTCG CCACAGCGGC CTCGGCCTCC CCTTGGATTC AGACGCCGAT TCGCCCAGTG 300 TTTGGGAAAT GGGAAGTAAT GACAGCTGGC ACCTGAACTA AGTACTTTTA TAGGCAACAC 360 CATTCCAGAA ATTCAGG ATG AAT GGG GAT ATG CCC CAT GTC CCC ATT ACT 410 Met Asn Gly Asp Met Pro His Val Pro Ile Thr 1 5 10 ACT CTT GCG GGG ATT GCT AGT CTC ACA GAC CTC CTG AAC CAG CTG CCT 458 Thr Leu Ala Gly Ile Ala Ser Leu Thr Asp Leu Leu Asn Gln Leu Pro 15 20 25 CTT CCA TCT CCT TTA CCT GCT ACA ACT ACA AAG AGC CTT CTC TTT AAT 506 Leu Pro Ser Pro Leu Pro Ala Thr Thr Thr Lys Ser Leu Leu Phe Asn 30 35 40 GCA CGA ATA GCA GAA GAG GTG AAC TGC CTT TTG GCT TGT AGG GAT GAC 554 Ala Arg Ile Ala Glu Glu Val Asn Cys Leu Leu Ala Cys Arg Asp Asp 45 50 55 AAT TTG GTT TCA CAG CTT GTC CAT AGC CTC AAC CAG GTA TCA ACA GAT 602 Asn Leu Val Ser Gln Leu Val His Ser Leu Asn Gln Val Ser Thr Asp 60 65 70 75 CAC ATA GAG TTG AAA GAT AAC CTT GGC AGT GAT GAC CCA GAA GGT GAC 650 His Ile Glu Leu Lys Asp Asn Leu Gly Ser Asp Asp Pro Glu Gly Asp 80 85 90 ATA CCA GTC TTG TTG CAG GCC GTC CTG GCA AGG AGT CCT AAT GTT TTC 698 Ile Pro Val Leu Leu Gln Ala Val Leu Ala Arg Ser Pro Asn Val Phe 95 100 105 AGG GAG AAA AGC ATG CAG AAC AGA TAT GTA CAA AGT GGA ATG ATG ATG 746 Arg Glu Lys Ser Met Gln Asn Arg Tyr Val Gln Ser Gly Met Met Met 110 115 120 TCT CAG TAT AAA CTT TCT CAG AAT TCC ATG CAC AGT AGT CCT GCA TCT 794 Ser Gln Tyr Lys Leu Ser Gln Asn Ser Met His Ser Ser Pro Ala Ser 125 130 135 TCC AAT TAT CAA CAA ACC ACT ATC TCA CAT AGC CCC TCC AGC CGG TTT 842 Ser Asn Tyr Gln Gln Thr Thr Ile Ser His Ser Pro Ser Ser Arg Phe 140 145 150 155 GTG CCA CCA CAG ACA AGC TCT GGG AAC AGA TTT ATG CCA CAG CAA AAT 890 Val Pro Pro Gln Thr Ser Ser Gly Asn Arg Phe Met Pro Gln Gln Asn 160 165 170 AGC CCA GTG CCT AGT CCA TAC GCC CCA CAA AGC CCT GCA GGA TAC ATG 938 Ser Pro Val Pro Ser Pro Tyr Ala Pro Gln Ser Pro Ala Gly Tyr Met 175 180 185 CCA TAT TCC CAT CCT TCA AGT TAC ACA ACA CAT CCA CAG ATG CAA CAA 986 Pro Tyr Ser His Pro Ser Ser Tyr Thr Thr His Pro Gln Met Gln Gln 190 195 200 GCA TCG GTA TCA AGT CCC ATT GTT GCA GGT GGT TTG AGA AAC ATA CAT 1034 Ala Ser Val Ser Ser Pro Ile Val Ala Gly Gly Leu Arg Asn Ile His 205 210 215 GAT AAT AAA GTT TCT GGT CCG TTG TCT GGC AAT TCA GCT AAT CAT CAT 1082 Asp Asn Lys Val Ser Gly Pro Leu Ser Gly Asn Ser Ala Asn His His 220 225 230 235 GCT GAT AAT CCT AGA CAT GGT TCA AGT GAG GAC TAC CTA CAC ATG GTG 1130 Ala Asp Asn Pro Arg His Gly Ser Ser Glu Asp Tyr Leu His Met Val 240 245 250 CAC AGG CTA AGT AGT GAC GAT GGA GAT TCT TCA ACA ATG AGG AAT GCT 1178 His Arg Leu Ser Ser Asp Asp Gly Asp Ser Ser Thr Met Arg Asn Ala 255 260 265 GCA TCT TTT CCC TTG AGA TCT CCA CAG CCA GTA TGC TCC CCT GCT GGA 1226 Ala Ser Phe Pro Leu Arg Ser Pro Gln Pro Val Cys Ser Pro Ala Gly 270 275 280 AGT GAA GGA ACT CCT AAA GGC TCA AGA CCA CCT TTA ATC CTA CAA TCT 1274 Ser Glu Gly Thr Pro Lys Gly Ser Arg Pro Pro Leu Ile Leu Gln Ser 285 290 295 CAG TCT CTA CCT TGT TCA TCA CCT CGA GAT GTT CCA CCA GAT ATC TTG 1322 Gln Ser Leu Pro Cys Ser Ser Pro Arg Asp Val Pro Pro Asp Ile Leu 300 305 310 315 CTA GAT TCT CCA GAA AGA AAA CAA AAG AAG CAG AAG AAA ATG AAA TTA 1370 Leu Asp Ser Pro Glu Arg Lys Gln Lys Lys Gln Lys Lys Met Lys Leu 320 325 330 GGC AAG GAT GAA AAA GAG CAG AGT GAG AAA GCG GCA ATG TAT GAT ATA 1418 Gly Lys Asp Glu Lys Glu Gln Ser Glu Lys Ala Ala Met Tyr Asp Ile 335 340 345 ATT AGT TCT CCA TCC AAG GAC TCT ACT AAA CTT ACA TTA AGA CTT TCT 1466 Ile Ser Ser Pro Ser Lys Asp Ser Thr Lys Leu Thr Leu Arg Leu Ser 350 355 360 CGT GTA AGG TCT TCA GAC ATG GAC CAG CAA GAG GAT ATG ATT TCT GGT 1514 Arg Val Arg Ser Ser Asp Met Asp Gln Gln Glu Asp Met Ile Ser Gly 365 370 375 GTG GAA AAT AGC AAT GTT TCA GAA AAT GAT ATT CCT TTT AAT GTG CAG 1562 Val Glu Asn Ser Asn Val Ser Glu Asn Asp Ile Pro Phe Asn Val Gln 380 385 390 395 TAC CCA GGA CAG ACT TCA AAA ACA CCC ATT ACT CCA CAA GAT ATA AAC 1610 Tyr Pro Gly Gln Thr Ser Lys Thr Pro Ile Thr Pro Gln Asp Ile Asn 400 405 410 CGC CCA CTA AAT GCT GCT CAA TGT TTG TCG CAG CAA GAA CAA ACA GCA 1658 Arg Pro Leu Asn Ala Ala Gln Cys Leu Ser Gln Gln Glu Gln Thr Ala 415 420 425 TTC CTT CCA GCA AAT CAA GTG CCT GTT TTA CAA CAG AAC ACT TCA GTT 1706 Phe Leu Pro Ala Asn Gln Val Pro Val Leu Gln Gln Asn Thr Ser Val 430 435 440 GCT GCA AAA CAA CCC CAG ACC AAT AGT CAC AAA ACC TTG GTG CAG CCT 1754 Ala Ala Lys Gln Pro Gln Thr Asn Ser His Lys Thr Leu Val Gln Pro 445 450 455 GGA ACA GGC ATA GAG GTC TCA GCA GAG CTG CCC AAG GAC AAG ACC TAAGA 1804 Gly Thr Gly Ile Glu Val Ser Ala Glu Leu Pro Lys Asp Lys Thr 460 465 470 TCCAGCAGGG AACTATGTAG TCACCCCGAG AGGCCCAGCT CTCTCCGTGA GCTCTGGGCC 1864 TAGGGTGGGG GTGGTTGTTG GTTCTGCGCG CACTGTTCCC CCTACATGAT GGGTCCATCC 1924 CAGTTGGCTT CTCTCACTCG CTTCCTCCTG TGGAGAAGCC TGTCCAGGTG TCACTGCCTC 1984 CAGGAAGCTG TCTCTGATTT CTCCAGTTGA ACAGTGAGAT TTGCCACACC TCACATGCAT 2044 CGCTCTTGTC CCTGGAATTG TAACCATAGG TTTTCCTGTC TCCTGGAGGA CAAGGATGAG 2104 GGCTTTCCAC TTGAGTCTCC CTGGTGGAGC CCAGCTCCTG ACATACCTGG TAAAAGTTCT 2164 CAAGAGAAGA ACATGGAGGA GGAATGTGGA TAACAACCCT GGCTGCCTGT GTGTTCCAAG 2224 CTAGGAAGAT GTAATGTCCC CACAAACGGG GTAAATGGCT TGCCTGCGTC ACAGCTGTCT 2284 CAAGCCCAGG CCCTGGGCGC CAGCCCAAGC CCAAGGACTA GGTCCAGAGC CACACAGCGC 2344 CAGGCCACAT CCGCCTCACC TGGGACCCTT TGTGGGGTAC AGTCTCCGGC CCCACCCAGA 2404 CCTCCTGAAG GAGAGACCCC ATGGCAAGGA CTCAGCCACC TGCAGTTTCA TAAGCCCCCA 2464 GTGGGTTCCT AGGCATGAAG ACCACCGGTT AGAGGCTGAA CTGGCAGGAA CCTGTCTCCA 2524 GCCCCTTCTC ACCCCAGCCG GGCCCTGCCT CAGAGGCAGC ACCCAGGACG TGGCCATGAC 2584 CCGTGGACTC CACTCAATCC CTCTTCTCCA GGAGCCATGC AAAGTGTCAG CCAGCCAGGC 2644 CCCTGGAAGG CAGTCATCAC CTCTTAAGGC ATTGTGGGTG TCGGTCCTGC AACTGCCAGG 2704 TGCAGCACAC GACCCGTGTC CGGTGTTCGA TAGCAGGGAG CCATGACCTG GCAACGATTC 2764 CACGCTCAAA GGGGCACCCG GGGGGCCCTG GGTCGGGGCG GATCAGCTTT CCCTGGGCAC 2824 ATCTGCCTCA TTCCAGATCT CCAGGGCTCA TGTCTGTGAC AGGGAGGGAA GGCTCTGCCC 2884 TGGCCTTCCG TCAGCTCTGC CAGTGCAGGC TGGGCAGCCT GGGCTTTAGA GCTGGCTTCT 2944 GCCCACACTT TCTCCGTGAA AGGAAAACAA CTATGAGTCT GCCAAACGCA TCTCAGATGC 3004 GTTTTAAAAA ATTCTGGTCC CCGCTCTCTG TCCCATCATC CGCCTCGGGG ACTTCCTCTC 3064 TCCGTGGTTC TCACCCCATA CTCTGTCACT GCCACATTTT CACCTGGGCC TGGCCTTTGT 3124 CTCCACCTGA AACTCCTGAA AATCTTGAAA TGGATTTCTA GGTCACTGGG GACTCCGGCA 3184 GCACATTCGG CTTCAGAATA AAGGGCGCCC GCGGTCCCCC AGCACCTCCC CAAGCCACAC 3244 CCCTAGCTTC CCTCCCTATC CCTGCAGCCT GAGGGTCCCT TCAGCCACCC TTAAGTCCCC 3304 ACCTGGGCTC CTGCCCCGCC CCTGGCTAGC AGCGCCTTCT CCACCGGGGC CCCCTCTGCT 3364 CACAGAGCCC CCTCACCTCC CTGGGGATGA GGGGCCAGGC CATGACCCTG AAAGTCTAGC 3424 CCTGGCCTTG ACCTCCCAGG AGCGCCCTCC CCGCCCTCTC CCGGCCCCGG CCCCGTCCTC 3484 TGCTGCTGGC CTCTGGGTCG TGCCCCGCAG ACTGAGCTGC GCTTGGGGGT CCTGGCGGCC 3544 TGGGCCGTCC CGCACCGAAC CCAGGCGGTC GGAGCCCGGC GGGAAGGCGC GAGGTCCTTC 3604 TGGGGGCTCC TCCGACGCCT GAGGGCGCTG CTTCCCCGCG GCCGCCCCGG GTTTCTGCGG 3664 AGCCGGGGCC TCCGCTCTCG GGTGACCCGG TGAGACCCCC GGGGAGGCCG CTGGGGAGGC 3724 GCGGGCTCTG CTCCCGGGTC CCAAACGCAC TGGCTGCCCC TCAGGAGGGA CGGCGACCTC 3784 CACCCACGGC GCTGGCGCCC GCACGGCCGC TCCTCCCGCT CCCGCAGCCT GGACGCCTCC 3844 CGAGGCCGCC CCGCCGGGCC CCACGCGCGG CCCCATCCGC AGGCCAGGAC TGCCTTCCCG 3904 GAGCTGGCGG CCCCCAGCCT GGAGGAGCCG GCCCCAGACG CCCTCCCAGC CCTCCCCAGC 3964 CCACTCTGGC CCCGCAGCCC CCGCCTGGTC CGAGTGCGGG TCTCTGGCCC CGGCCTTTCC 4024 CGGGGAAGGA AAGCAAAAAG CTT 4047 474 amino acids amino acid single linear protein internal 148 Met Asn Gly Asp Met Pro His Val Pro Ile Thr Thr Leu Ala Gly Ile 1 5 10 15 Ala Ser Leu Thr Asp Leu Leu Asn Gln Leu Pro Leu Pro Ser Pro Leu 20 25 30 Pro Ala Thr Thr Thr Lys Ser Leu Leu Phe Asn Ala Arg Ile Ala Glu 35 40 45 Glu Val Asn Cys Leu Leu Ala Cys Arg Asp Asp Asn Leu Val Ser Gln 50 55 60 Leu Val His Ser Leu Asn Gln Val Ser Thr Asp His Ile Glu Leu Lys 65 70 75 80 Asp Asn Leu Gly Ser Asp Asp Pro Glu Gly Asp Ile Pro Val Leu Leu 85 90 95 Gln Ala Val Leu Ala Arg Ser Pro Asn Val Phe Arg Glu Lys Ser Met 100 105 110 Gln Asn Arg Tyr Val Gln Ser Gly Met Met Met Ser Gln Tyr Lys Leu 115 120 125 Ser Gln Asn Ser Met His Ser Ser Pro Ala Ser Ser Asn Tyr Gln Gln 130 135 140 Thr Thr Ile Ser His Ser Pro Ser Ser Arg Phe Val Pro Pro Gln Thr 145 150 155 160 Ser Ser Gly Asn Arg Phe Met Pro Gln Gln Asn Ser Pro Val Pro Ser 165 170 175 Pro Tyr Ala Pro Gln Ser Pro Ala Gly Tyr Met Pro Tyr Ser His Pro 180 185 190 Ser Ser Tyr Thr Thr His Pro Gln Met Gln Gln Ala Ser Val Ser Ser 195 200 205 Pro Ile Val Ala Gly Gly Leu Arg Asn Ile His Asp Asn Lys Val Ser 210 215 220 Gly Pro Leu Ser Gly Asn Ser Ala Asn His His Ala Asp Asn Pro Arg 225 230 235 240 His Gly Ser Ser Glu Asp Tyr Leu His Met Val His Arg Leu Ser Ser 245 250 255 Asp Asp Gly Asp Ser Ser Thr Met Arg Asn Ala Ala Ser Phe Pro Leu 260 265 270 Arg Ser Pro Gln Pro Val Cys Ser Pro Ala Gly Ser Glu Gly Thr Pro 275 280 285 Lys Gly Ser Arg Pro Pro Leu Ile Leu Gln Ser Gln Ser Leu Pro Cys 290 295 300 Ser Ser Pro Arg Asp Val Pro Pro Asp Ile Leu Leu Asp Ser Pro Glu 305 310 315 320 Arg Lys Gln Lys Lys Gln Lys Lys Met Lys Leu Gly Lys Asp Glu Lys 325 330 335 Glu Gln Ser Glu Lys Ala Ala Met Tyr Asp Ile Ile Ser Ser Pro Ser 340 345 350 Lys Asp Ser Thr Lys Leu Thr Leu Arg Leu Ser Arg Val Arg Ser Ser 355 360 365 Asp Met Asp Gln Gln Glu Asp Met Ile Ser Gly Val Glu Asn Ser Asn 370 375 380 Val Ser Glu Asn Asp Ile Pro Phe Asn Val Gln Tyr Pro Gly Gln Thr 385 390 395 400 Ser Lys Thr Pro Ile Thr Pro Gln Asp Ile Asn Arg Pro Leu Asn Ala 405 410 415 Ala Gln Cys Leu Ser Gln Gln Glu Gln Thr Ala Phe Leu Pro Ala Asn 420 425 430 Gln Val Pro Val Leu Gln Gln Asn Thr Ser Val Ala Ala Lys Gln Pro 435 440 445 Gln Thr Asn Ser His Lys Thr Leu Val Gln Pro Gly Thr Gly Ile Glu 450 455 460 Val Ser Ala Glu Leu Pro Lys Asp Lys Thr 465 470 2998 base pairs nucleic acid double linear Genomic DNA Coding Sequence 26...799 149 AAGCTTTTTG AATTCGGCAC GAGAT GCT ACA CAG GCT ATA TTT GAA ATA CTG 52 Ala Thr Gln Ala Ile Phe Glu Ile Leu 1 5 GAG AAA TCC TGG TTG CCC CAG AAT TGT ACA CTG GTT GAT ATG AAG ATT 100 Glu Lys Ser Trp Leu Pro Gln Asn Cys Thr Leu Val Asp Met Lys Ile 10 15 20 25 GAA TTT GGT GTT GAT GTA ACC ACC AAA GAA ATT GTT CTT GCT GAT GTT 148 Glu Phe Gly Val Asp Val Thr Thr Lys Glu Ile Val Leu Ala Asp Val 30 35 40 ATT GAC AAT GAT TCC TGG AGA CTC TGG CCA TCA GGA GAT CGA AGC CAA 196 Ile Asp Asn Asp Ser Trp Arg Leu Trp Pro Ser Gly Asp Arg Ser Gln 45 50 55 CAG AAA GAC AAA CAG TCT TAT CGG GAC CTC AAA GAA GTA ACT CCT GAA 244 Gln Lys Asp Lys Gln Ser Tyr Arg Asp Leu Lys Glu Val Thr Pro Glu 60 65 70 GGG CTC CAA ATG GTA AAG AAA AAC TTT GAG TGG GTT GCA GAG AGA GTA 292 Gly Leu Gln Met Val Lys Lys Asn Phe Glu Trp Val Ala Glu Arg Val 75 80 85 GAG TTG CTT TTG AAA TCA GAA AGT CAG TGC AGG GTT GTA GTG TTG ATG 340 Glu Leu Leu Leu Lys Ser Glu Ser Gln Cys Arg Val Val Val Leu Met 90 95 100 105 GGC TCT ACT TCT GAT CTT GGT CAC TGT GAA AAA ATC AAG AAG GCC TGT 388 Gly Ser Thr Ser Asp Leu Gly His Cys Glu Lys Ile Lys Lys Ala Cys 110 115 120 GGA AAT TTT GGC ATT CCA TGT GAA CTT CGA GTA ACA TCT GCG CAT AAA 436 Gly Asn Phe Gly Ile Pro Cys Glu Leu Arg Val Thr Ser Ala His Lys 125 130 135 GGA CCA GAT GAA ACT CTG AGG ATT AAA GCT GAG TAT GAA GGG GAT GGC 484 Gly Pro Asp Glu Thr Leu Arg Ile Lys Ala Glu Tyr Glu Gly Asp Gly 140 145 150 ATT CCT ACT GTA TTT GTG GCA GTG GCA GGC AGA AGT AAT GGT TTG GGA 532 Ile Pro Thr Val Phe Val Ala Val Ala Gly Arg Ser Asn Gly Leu Gly 155 160 165 CCA GTG ATG TCT GGG AAC ACT GCA TAT CCA GTT ATC AGC TGT CCT CCC 580 Pro Val Met Ser Gly Asn Thr Ala Tyr Pro Val Ile Ser Cys Pro Pro 170 175 180 185 CTC ACA CCA GAC TGG GGA GTT CAG GAT GTG TGG TCT TCT CTT CGA CTA 628 Leu Thr Pro Asp Trp Gly Val Gln Asp Val Trp Ser Ser Leu Arg Leu 190 195 200 CCC AGT GGT CTT GGC TGT TCA ACC GTA CTT TCT CCA GAA GGA TCA GCT 676 Pro Ser Gly Leu Gly Cys Ser Thr Val Leu Ser Pro Glu Gly Ser Ala 205 210 215 CAA TTT GCT GCT CAG ATA TTT GGG TTA AGC AAC CAT TTG GTA TGG AGC 724 Gln Phe Ala Ala Gln Ile Phe Gly Leu Ser Asn His Leu Val Trp Ser 220 225 230 AAA CTG CGA GCA AGC ATT TTG AAC ACA TGG ATT TCC TTG AAG CAG GCT 772 Lys Leu Arg Ala Ser Ile Leu Asn Thr Trp Ile Ser Leu Lys Gln Ala 235 240 245 GAC AAG AAA ATC AGA GAA TGT AAT TTA TAAGAAAGAA TGCCATTGAA TTTTTTA 826 Asp Lys Lys Ile Arg Glu Cys Asn Leu 250 255 GGGGAAAAAC TACAAATTTC TAATTTAGCT GAAGGAAAAT CAAGCAAGAT GAAAAGGTAA 886 TTTTAAATTA GAGAACACAA ATAAAATGTA TTAGTGAATA AATGGTGAGG GTAGGCCTAT 946 TCAGATGCAA GGCCAGCAAT GGGGCTCCCC ATTATCCCCA CCCCTTTGGT CCCAGTCCCC 1006 TTCTCTGCAA TGGGCACGCA TAGAGGAGAG ACAAAGGGTA TTAGACGCAA CATCATTGGC 1066 CCAGGGGAGT CCGAGAAGAG CTGCCATTGG CTGACAGGGC ATTTTCAGGC TCTGTCATTG 1126 GTCAGGGAGC ACACCCCAGC CTGAAGAGTG ATGCCATTGG CCAGGGAGTG GTTTTGTCAT 1186 AGCCGTTGGC TGTGAAGTGG AAGGAAAAGA TCTGGGAATG AAGCCCTGTG GCCAGGAAGA 1246 TAGACAGGGC AGCAACTTCT GGGCCTCCAG GCCCTCTTCC CACCATAGCA ATGTGGGCAA 1306 AACTGGTGTC AGGCCCCAGC CAGAAAAAGG AGCCCAAGCC AGAGGGCAAG TGACAAAGGA 1366 TGTACCATGT CCAATCTCCC ACACCCTGGG GCTGCCCTTC CCAATGTCTT TCTTGATAGC 1426 CAAGTTGGGC TGGGAGCAGC TCACTGCTCC TCTAGCCAGG AGGGTTTCTC AGCTCCTGGA 1486 GGCCGCAGCT TGATGTTGAA CTGCTGCAGG GTCTGCTCCA GCTGTTTCTG GTTCCCAGCA 1546 AAGTAGGCGG ACACAGCATT GTGGAAGAGC AGCAGCTGCT TGTGCATCAC CTTGATCTTG 1606 TTTTCTTCCA GGAACTTGAG CTTGATGGCC ACATCTCCCC GCAGCTTCTC ATACTTGTCC 1666 CGATGGGCCT GGAAAGTGGC CTGGGCACTC TCAAGTCGAC CACGTGTCCC TGCATCCCGG 1726 GGGCCTAGAC TCAGCTCCTC TAAGTCTGTT CGGTAGGCAT CATATTCCAG CCTGGCAGCC 1786 TCATACTGTT TCACAGTCAT GAGCGTGTCT TCCATGGTCT TGGTGACCAA TGTGTTGATG 1846 CTAGAGACAA AGAAGTTCAC GGCTCCTAGC AGCGTTTCCC CATTCTTGCA TAGTAGTTTC 1906 TGTGTCTCTG CATTGTAGCC AAATTCCTCC TGAAGCTCTG GGGACTTCTG GCTGAGGTCA 1966 GCAAAGGCAT CACCCAGTGC ATGCTGGGTC TGCAGCAGGC TGTAGAGGTG GGCTGTCAGT 2026 GCCCGGCCCA GCTGCAGGAC ACTCTCATAC TTGCGCTTCG TCTCACGCAG CAACTCAATC 2086 TGCAGCTCTA GCTCCAGGAT TCCGGCGCCT CCACTCCGTC CCCCGCGGGT CTGCTCTGTG 2146 TGCCATGGAC GGCATTGTCC CAGATATAGC CGTTGGTACA AAGCGGGGAT CTGACGAGCT 2206 TTTCTCTACT TGTGTCACTA ACGGACCGTT TATCATGAGC AGCAACTCGG CTTCTGCAGC 2266 AAACGGAAAT GACAGCAAGA AGTTCAAAGG TGACAGCCGA AGTGCAGGCG TCCCCTCTAG 2326 AGTGATCCAC ATCCGGAAGC TCCCCATCGA CGTCACGGAG GGGGAAGTCA TCTCCCTGGG 2386 GCTGCCCTTT GGGAAGGTCA CCAACCTCCT GATGCTGAAG GGGAAAAACC AGGCCTTCAT 2446 CGAGATGAAC ACGGAGGAGG CTGCCAATAC CATGGTGAAC TACTACACCT CGGTGACCCC 2506 TGTGCTGCGC GGCCAGCCCA TCTACATCCA GTTCTCCAAC CACAAGGAGC TGAAGACCGA 2566 CAGCTCTCCC AACCAGGCGC GGGCCCAGGC GGCCCTGCAG GCGGTGAACT CGGTCCAGTC 2626 GGGGAACCTG GCCTTGGCTG CCTCGGCGGC GGCCGTGGAT GCAGGGATGG CGATGGCCGG 2686 GCAGAGCCCC GTGCTCAGGA TCATCGTGGA GAACCTCTTC TACCCTGTGA CCCTGGATGT 2746 GCTGCACCAG ATTTTCTCCA AGTTCGGCAC AGTGTTGAAG ATCATCACCT TCACCAAGAA 2806 CAACCAGTTC CAGGCCCTGC TGCAGTATGC GGACCCCGTG AGCGCCCAGC ACGCCAAGCT 2866 GTCGCTGGAC GGGCAGAACA TCTACAACGC CTGCTGCACG CTGCGCATCG ACTTTTCCAA 2926 GCTCACCAGC CTCAACGTCA AGTACAACAA TGACAAGAGC CGTGACTACC TCGTGCCGAA 2986 TTCTTTGGAT CC 2998 258 amino acids amino acid single linear protein internal 150 Ala Thr Gln Ala Ile Phe Glu Ile Leu Glu Lys Ser Trp Leu Pro Gln 1 5 10 15 Asn Cys Thr Leu Val Asp Met Lys Ile Glu Phe Gly Val Asp Val Thr 20 25 30 Thr Lys Glu Ile Val Leu Ala Asp Val Ile Asp Asn Asp Ser Trp Arg 35 40 45 Leu Trp Pro Ser Gly Asp Arg Ser Gln Gln Lys Asp Lys Gln Ser Tyr 50 55 60 Arg Asp Leu Lys Glu Val Thr Pro Glu Gly Leu Gln Met Val Lys Lys 65 70 75 80 Asn Phe Glu Trp Val Ala Glu Arg Val Glu Leu Leu Leu Lys Ser Glu 85 90 95 Ser Gln Cys Arg Val Val Val Leu Met Gly Ser Thr Ser Asp Leu Gly 100 105 110 His Cys Glu Lys Ile Lys Lys Ala Cys Gly Asn Phe Gly Ile Pro Cys 115 120 125 Glu Leu Arg Val Thr Ser Ala His Lys Gly Pro Asp Glu Thr Leu Arg 130 135 140 Ile Lys Ala Glu Tyr Glu Gly Asp Gly Ile Pro Thr Val Phe Val Ala 145 150 155 160 Val Ala Gly Arg Ser Asn Gly Leu Gly Pro Val Met Ser Gly Asn Thr 165 170 175 Ala Tyr Pro Val Ile Ser Cys Pro Pro Leu Thr Pro Asp Trp Gly Val 180 185 190 Gln Asp Val Trp Ser Ser Leu Arg Leu Pro Ser Gly Leu Gly Cys Ser 195 200 205 Thr Val Leu Ser Pro Glu Gly Ser Ala Gln Phe Ala Ala Gln Ile Phe 210 215 220 Gly Leu Ser Asn His Leu Val Trp Ser Lys Leu Arg Ala Ser Ile Leu 225 230 235 240 Asn Thr Trp Ile Ser Leu Lys Gln Ala Asp Lys Lys Ile Arg Glu Cys 245 250 255 Asn Leu 1038 amino acids amino acid single linear 151 Ile Gln Arg Phe Gly Thr Ser Gly His Ile Met Asn Leu Gln Ala Gln 1 5 10 15 Pro Lys Ala Gln Asn Lys Arg Lys Arg Cys Leu Phe Gly Gly Gln Glu 20 25 30 Pro Ala Pro Lys Glu Gln Pro Pro Pro Leu Gln Pro Pro Gln Gln Ser 35 40 45 Ile Arg Val Lys Glu Glu Gln Tyr Leu Gly His Glu Gly Pro Gly Gly 50 55 60 Ala Val Ser Thr Ser Gln Pro Val Glu Leu Pro Pro Pro Ser Ser Leu 65 70 75 80 Ala Leu Leu Asn Ser Val Val Tyr Gly Pro Glu Arg Thr Ser Ala Ala 85 90 95 Met Leu Ser Gln Gln Val Ala Ser Val Lys Trp Pro Asn Ser Val Met 100 105 110 Ala Pro Gly Arg Gly Pro Glu Arg Gly Gly Gly Gly Gly Val Ser Asp 115 120 125 Ser Ser Trp Gln Gln Gln Pro Gly Gln Pro Pro Pro His Ser Thr Trp 130 135 140 Asn Cys His Ser Leu Ser Leu Tyr Ser Ala Thr Lys Gly Ser Pro His 145 150 155 160 Pro Gly Val Gly Val Pro Thr Tyr Tyr Asn His Pro Glu Ala Leu Lys 165 170 175 Arg Glu Lys Ala Gly Gly Pro Gln Leu Asp Arg Tyr Val Arg Pro Met 180 185 190 Met Pro Gln Lys Val Gln Leu Glu Val Gly Arg Pro Gln Ala Pro Leu 195 200 205 Asn Ser Phe His Ala Ala Lys Lys Pro Pro Asn Gln Ser Leu Pro Leu 210 215 220 Gln Pro Phe Gln Leu Ala Phe Gly His Gln Val Asn Arg Gln Val Phe 225 230 235 240 Arg Gln Gly Pro Pro Pro Pro Asn Pro Val Ala Ala Phe Pro Pro Gln 245 250 255 Lys Gln Gln Gln Gln Gln Gln Pro Gln Gln Gln Gln Gln Gln Gln Gln 260 265 270 Ala Ala Leu Pro Gln Met Pro Leu Phe Glu Asn Phe Tyr Ser Met Pro 275 280 285 Gln Gln Pro Ser Gln Gln Pro Gln Asp Phe Gly Leu Gln Pro Ala Gly 290 295 300 Pro Leu Gly Gln Ser His Leu Ala His His Ser Met Ala Pro Tyr Pro 305 310 315 320 Phe Pro Pro Asn Pro Asp Met Asn Pro Glu Leu Arg Lys Ala Leu Leu 325 330 335 Gln Asp Ser Ala Pro Gln Pro Ala Leu Pro Gln Val Gln Ile Pro Phe 340 345 350 Pro Arg Arg Ser Arg Arg Leu Ser Lys Glu Gly Ile Leu Pro Pro Ser 355 360 365 Ala Leu Asp Gly Ala Gly Thr Gln Pro Gly Gln Glu Ala Thr Gly Asn 370 375 380 Leu Phe Leu His His Trp Pro Leu Gln Gln Pro Pro Pro Gly Ser Leu 385 390 395 400 Gly Gln Pro His Pro Glu Ala Leu Gly Phe Pro Leu Glu Leu Arg Glu 405 410 415 Ser Gln Leu Leu Pro Asp Gly Glu Arg Leu Ala Pro Asn Gly Arg Glu 420 425 430 Arg Glu Ala Pro Ala Met Gly Ser Glu Glu Gly Met Arg Ala Val Ser 435 440 445 Thr Gly Asp Cys Gly Gln Val Leu Arg Gly Gly Val Ile Gln Ser Thr 450 455 460 Arg Arg Arg Arg Arg Ala Ser Gln Glu Ala Asn Leu Leu Thr Leu Ala 465 470 475 480 Gln Lys Ala Val Glu Leu Ala Ser Leu Gln Asn Ala Lys Asp Gly Ser 485 490 495 Gly Ser Glu Glu Lys Arg Lys Ser Val Leu Ala Ser Thr Thr Lys Cys 500 505 510 Gly Val Glu Phe Ser Glu Pro Ser Leu Ala Thr Lys Arg Ala Arg Glu 515 520 525 Asp Ser Gly Met Val Pro Leu Ile Ile Pro Val Ser Val Pro Val Arg 530 535 540 Thr Val Asp Pro Thr Glu Ala Ala Gln Ala Gly Gly Leu Asp Glu Asp 545 550 555 560 Gly Lys Gly Leu Glu Gln Asn Pro Ala Glu His Lys Pro Ser Val Ile 565 570 575 Val Thr Arg Arg Arg Ser Thr Arg Ile Pro Gly Thr Asp Ala Gln Ala 580 585 590 Gln Ala Glu Asp Met Asn Val Lys Leu Glu Gly Glu Pro Ser Val Arg 595 600 605 Lys Pro Lys Gln Arg Pro Arg Pro Glu Pro Leu Ile Ile Pro Thr Lys 610 615 620 Ala Gly Thr Phe Ile Ala Pro Pro Val Tyr Ser Asn Ile Thr Pro Tyr 625 630 635 640 Gln Ser His Leu Arg Ser Pro Val Arg Leu Ala Asp His Pro Ser Glu 645 650 655 Arg Ser Phe Glu Leu Pro Pro Tyr Thr Pro Pro Pro Ile Leu Ser Pro 660 665 670 Val Arg Glu Gly Ser Gly Leu Tyr Phe Asn Ala Ile Ile Ser Thr Ser 675 680 685 Thr Ile Pro Ala Pro Pro Pro Ile Thr Pro Lys Ser Ala His Arg Thr 690 695 700 Leu Leu Arg Thr Asn Ser Ala Glu Val Thr Pro Pro Val Leu Ser Val 705 710 715 720 Met Gly Glu Ala Thr Pro Val Ser Ile Glu Pro Arg Ile Asn Val Gly 725 730 735 Ser Arg Phe Gln Ala Glu Ile Pro Leu Met Arg Asp Arg Ala Leu Ala 740 745 750 Ala Ala Asp Pro His Lys Ala Asp Leu Val Trp Gln Pro Trp Glu Asp 755 760 765 Leu Glu Ser Ser Arg Glu Lys Gln Arg Gln Val Glu Asp Leu Leu Thr 770 775 780 Ala Ala Cys Ser Ser Ile Phe Pro Gly Ala Gly Thr Asn Gln Glu Leu 785 790 795 800 Ala Leu His Cys Leu His Glu Ser Arg Gly Asp Ile Leu Glu Thr Leu 805 810 815 Asn Lys Leu Leu Leu Lys Lys Pro Leu Arg Pro His Asn His Pro Leu 820 825 830 Ala Thr Tyr His Tyr Thr Gly Ser Asp Gln Trp Lys Met Ala Glu Arg 835 840 845 Lys Leu Phe Asn Lys Gly Ile Ala Ile Tyr Lys Lys Asp Phe Phe Leu 850 855 860 Val Gln Lys Leu Ile Gln Thr Lys Thr Val Ala Gln Cys Val Glu Phe 865 870 875 880 Tyr Tyr Thr Tyr Lys Lys Gln Val Lys Ile Gly Arg Asn Gly Thr Leu 885 890 895 Thr Phe Gly Asp Val Asp Thr Ser Asp Glu Lys Ser Ala Gln Glu Glu 900 905 910 Val Glu Val Asp Ile Lys Thr Ser Gln Lys Phe Pro Arg Val Pro Leu 915 920 925 Pro Arg Arg Glu Ser Pro Ser Glu Glu Arg Leu Glu Pro Lys Arg Glu 930 935 940 Val Lys Glu Pro Arg Lys Glu Gly Glu Glu Glu Val Pro Glu Ile Gln 945 950 955 960 Glu Lys Glu Glu Gln Glu Glu Gly Arg Glu Arg Ser Arg Arg Ala Ala 965 970 975 Ala Val Lys Ala Thr Gln Thr Leu Gln Ala Asn Glu Ser Ala Ser Asp 980 985 990 Ile Leu Ile Leu Arg Ser His Glu Ser Asn Ala Pro Gly Ser Ala Gly 995 1000 1005 Gly Gln Ala Ser Glu Lys Pro Arg Glu Gly Thr Gly Lys Ser Arg Arg 1010 1015 1020 Ala Leu Pro Phe Ser Glu Lys Lys Lys Lys Lys Gln Lys Ala 1025 1030 1035 849 amino acids amino acid single linear 152 Ile Arg His Glu Val Ser Phe Leu Trp Asn Thr Glu Ala Ala Cys Pro 1 5 10 15 Ile Gln Thr Thr Thr Asp Thr Asp Gln Ala Cys Ser Ile Arg Asp Pro 20 25 30 Asn Ser Gly Phe Val Phe Asn Leu Asn Pro Leu Asn Ser Ser Gln Gly 35 40 45 Tyr Asn Val Ser Gly Ile Gly Lys Ile Phe Met Phe Asn Val Cys Gly 50 55 60 Thr Met Pro Val Cys Gly Thr Ile Leu Gly Lys Pro Ala Ser Gly Cys 65 70 75 80 Glu Ala Glu Thr Gln Thr Glu Glu Leu Lys Asn Trp Lys Pro Ala Arg 85 90 95 Pro Val Gly Ile Glu Lys Ser Leu Gln Leu Ser Thr Glu Gly Phe Ile 100 105 110 Thr Leu Thr Tyr Lys Gly Pro Leu Ser Ala Lys Gly Thr Ala Asp Ala 115 120 125 Phe Ile Val Arg Phe Val Cys Asn Asp Asp Val Tyr Ser Gly Pro Leu 130 135 140 Lys Phe Leu His Gln Asp Ile Asp Ser Gly Gln Gly Ile Arg Asn Thr 145 150 155 160 Tyr Phe Glu Phe Glu Thr Ala Leu Ala Cys Val Pro Ser Pro Val Asp 165 170 175 Cys Gln Val Thr Asp Leu Ala Gly Asn Glu Tyr Asp Leu Thr Gly Leu 180 185 190 Ser Thr Val Arg Lys Pro Trp Thr Ala Val Asp Thr Ser Val Asp Gly 195 200 205 Arg Lys Arg Thr Phe Tyr Leu Ser Val Cys Asn Pro Leu Pro Tyr Ile 210 215 220 Pro Gly Cys Gln Gly Ser Ala Val Gly Ser Cys Leu Val Ser Glu Gly 225 230 235 240 Asn Ser Trp Asn Leu Gly Val Val Gln Met Ser Pro Gln Ala Ala Ala 245 250 255 Asn Gly Ser Leu Ser Ile Met Tyr Val Asn Gly Asp Lys Cys Gly Asn 260 265 270 Gln Arg Phe Ser Thr Arg Ile Thr Phe Glu Cys Ala Gln Ile Ser Gly 275 280 285 Ser Pro Ala Phe Gln Leu Gln Asp Gly Cys Glu Tyr Val Phe Ile Trp 290 295 300 Arg Thr Val Glu Ala Cys Pro Val Val Arg Val Glu Gly Asp Asn Cys 305 310 315 320 Glu Val Lys Asp Pro Arg His Gly Asn Leu Tyr Asp Leu Lys Pro Leu 325 330 335 Gly Leu Asn Asp Thr Ile Val Ser Ala Gly Glu Tyr Thr Tyr Tyr Phe 340 345 350 Arg Val Cys Gly Lys Leu Ser Ser Asp Val Cys Pro Thr Ser Asp Lys 355 360 365 Ser Lys Val Val Ser Ser Cys Gln Glu Lys Arg Glu Pro Gln Gly Phe 370 375 380 His Lys Val Ala Gly Leu Leu Thr Gln Lys Leu Thr Tyr Glu Asn Gly 385 390 395 400 Leu Leu Lys Met Asn Phe Thr Gly Gly Asp Thr Cys His Lys Val Tyr 405 410 415 Gln Arg Ser Thr Ala Ile Phe Phe Tyr Cys Asp Arg Gly Thr Gln Arg 420 425 430 Pro Val Phe Leu Lys Glu Thr Ser Asp Cys Ser Tyr Leu Phe Glu Trp 435 440 445 Arg Thr Gln Tyr Ala Cys Pro Pro Phe Asp Leu Thr Glu Cys Ser Phe 450 455 460 Lys Asp Gly Ala Gly Asn Ser Phe Asp Leu Ser Ser Leu Ser Arg Tyr 465 470 475 480 Ser Asp Asn Trp Glu Ala Ile Thr Gly Thr Gly Asp Pro Glu His Tyr 485 490 495 Leu Ile Asn Val Cys Lys Ser Leu Ala Pro Gln Ala Gly Thr Glu Pro 500 505 510 Cys Pro Pro Glu Ala Ala Ala Cys Leu Leu Gly Gly Ser Lys Pro Val 515 520 525 Asn Leu Gly Arg Val Arg Asp Gly Pro Gln Trp Arg Asp Gly Ile Ile 530 535 540 Val Leu Lys Tyr Val Asp Gly Asp Leu Cys Pro Asp Gly Ile Arg Lys 545 550 555 560 Lys Ser Thr Thr Ile Arg Phe Thr Cys Ser Glu Ser Gln Val Asn Ser 565 570 575 Arg Pro Met Phe Ile Ser Ala Val Glu Asp Cys Glu Tyr Thr Phe Ala 580 585 590 Trp Pro Thr Ala Thr Ala Cys Pro Met Lys Ser Asn Glu His Asp Asp 595 600 605 Cys Gln Val Thr Asn Pro Ser Thr Gly His Leu Phe Asp Leu Ser Ser 610 615 620 Leu Ser Gly Arg Ala Gly Phe Thr Ala Ala Tyr Ser Glu Lys Gly Leu 625 630 635 640 Val Tyr Met Ser Ile Cys Gly Glu Asn Glu Asn Cys Pro Pro Gly Val 645 650 655 Gly Ala Cys Phe Gly Gln Thr Arg Ile Ser Val Gly Lys Ala Asn Lys 660 665 670 Arg Leu Arg Tyr Val Asp Gln Val Leu Gln Leu Val Tyr Lys Asp Gly 675 680 685 Ser Pro Cys Pro Ser Lys Ser Gly Leu Ser Tyr Lys Ser Val Ile Ser 690 695 700 Phe Val Cys Arg Pro Glu Ala Gly Pro Thr Asn Arg Pro Met Leu Ile 705 710 715 720 Ser Leu Asp Lys Gln Thr Cys Thr Leu Phe Phe Ser Trp His Thr Pro 725 730 735 Leu Ala Cys Glu Gln Ala Thr Glu Cys Ser Val Arg Asn Gly Ser Ser 740 745 750 Ile Val Asp Leu Ser Pro Leu Ile His Arg Thr Gly Gly Tyr Glu Ala 755 760 765 Tyr Asp Glu Ser Glu Asp Asp Ala Ser Asp Thr Asn Pro Asp Phe Tyr 770 775 780 Ile Asn Ile Cys Gln Pro Leu Asn Pro Met His Gly Val Pro Cys Pro 785 790 795 800 Ala Gly Ala Ala Val Cys Lys Val Pro Ile Asp Gly Pro Pro Ile Asp 805 810 815 Ile Gly Arg Val Ala Gly Pro Pro Ile Leu Asn Pro Ile Ala Asn Glu 820 825 830 Ile Tyr Leu Asn Phe Glu Ser Ser Thr Pro Cys Gln Glu Phe Ser Cys 835 840 845 Lys 852 amino acids amino acid single linear 153 Met Ala Arg Leu Ser Arg Pro Glu Arg Pro Asp Leu Val Phe Glu Glu 1 5 10 15 Glu Asp Leu Pro Tyr Glu Glu Glu Ile Met Arg Asn Gln Phe Ser Val 20 25 30 Lys Cys Trp Leu His Tyr Ile Glu Phe Lys Gln Gly Ala Pro Lys Pro 35 40 45 Arg Leu Asn Gln Leu Tyr Glu Arg Ala Leu Lys Leu Leu Pro Cys Ser 50 55 60 Tyr Lys Leu Trp Tyr Arg Tyr Leu Lys Ala Arg Arg Ala Gln Val Lys 65 70 75 80 His Arg Cys Val Thr Asp Pro Ala Tyr Glu Asp Val Asn Asn Cys His 85 90 95 Glu Arg Ala Phe Val Phe Met His Lys Met Pro Arg Leu Trp Leu Asp 100 105 110 Tyr Cys Gln Phe Leu Met Asp Gln Gly Arg Val Thr His Thr Arg Arg 115 120 125 Thr Phe Asp Arg Ala Leu Arg Ala Leu Pro Ile Thr Gln His Ser Arg 130 135 140 Ile Trp Pro Leu Tyr Leu Arg Phe Leu Arg Ser His Pro Leu Pro Glu 145 150 155 160 Thr Ala Val Arg Gly Tyr Arg Arg Phe Leu Lys Leu Ser Pro Glu Ser 165 170 175 Ala Glu Glu Tyr Ile Glu Tyr Leu Lys Ser Ser Asp Arg Leu Asp Glu 180 185 190 Ala Ala Gln Arg Leu Ala Thr Val Val Asn Asp Glu Arg Phe Val Ser 195 200 205 Lys Ala Gly Lys Ser Asn Tyr Gln Leu Trp His Glu Leu Cys Asp Leu 210 215 220 Ile Ser Gln Asn Pro Asp Lys Val Gln Ser Leu Asn Val Asp Ala Ile 225 230 235 240 Ile Arg Gly Gly Leu Thr Arg Phe Thr Asp Gln Leu Gly Lys Leu Trp 245 250 255 Cys Ser Leu Ala Asp Tyr Tyr Ile Arg Ser Gly His Phe Glu Lys Ala 260 265 270 Arg Asp Val Tyr Glu Glu Ala Ile Arg Thr Val Met Thr Val Arg Asp 275 280 285 Phe Thr Gln Val Phe Asp Ser Tyr Ala Gln Phe Glu Glu Ser Met Ile 290 295 300 Ala Ala Lys Met Glu Thr Ala Ser Glu Leu Gly Arg Glu Glu Glu Asp 305 310 315 320 Asp Val Asp Leu Glu Leu Arg Leu Ala Arg Phe Glu Gln Leu Ile Ser 325 330 335 Arg Arg Pro Leu Leu Leu Asn Ser Val Leu Leu Arg Gln Asn Pro His 340 345 350 His Val His Glu Trp His Lys Arg Val Ala Leu His Gln Gly Arg Pro 355 360 365 Arg Glu Ile Ile Asn Thr Tyr Thr Glu Ala Val Gln Thr Val Asp Pro 370 375 380 Phe Lys Ala Thr Gly Lys Pro His Thr Leu Trp Val Ala Phe Ala Lys 385 390 395 400 Phe Tyr Glu Asp Asn Gly Gln Leu Asp Asp Ala Arg Val Ile Leu Glu 405 410 415 Lys Ala Thr Lys Val Asn Phe Lys Gln Val Asp Asp Leu Ala Ser Val 420 425 430 Trp Cys Gln Cys Gly Glu Leu Glu Leu Arg His Glu Asn Tyr Asp Glu 435 440 445 Ala Leu Arg Leu Leu Arg Lys Ala Thr Ala Leu Pro Ala Arg Arg Ala 450 455 460 Glu Tyr Phe Asp Gly Ser Glu Pro Val Gln Asn Arg Val Tyr Lys Ser 465 470 475 480 Leu Lys Val Trp Ser Met Leu Ala Asp Leu Glu Glu Ser Leu Gly Thr 485 490 495 Phe Gln Ser Thr Lys Ala Val Tyr Asp Arg Ile Leu Asp Leu Arg Ile 500 505 510 Ala Thr Pro Gln Ile Val Ile Asn Tyr Ala Met Phe Leu Glu Glu His 515 520 525 Lys Tyr Phe Glu Glu Ser Phe Lys Ala Tyr Glu Arg Gly Ile Ser Leu 530 535 540 Phe Lys Trp Pro Asn Val Ser Asp Ile Trp Ser Thr Tyr Leu Thr Lys 545 550 555 560 Phe Ile Ala Arg Tyr Gly Gly Arg Lys Leu Glu Arg Ala Arg Asp Leu 565 570 575 Phe Glu Gln Ala Leu Asp Gly Cys Pro Pro Lys Tyr Ala Lys Thr Leu 580 585 590 Tyr Leu Leu Tyr Ala Gln Leu Glu Glu Glu Trp Gly Leu Ala Arg His 595 600 605 Ala Met Ala Val Tyr Glu Arg Ala Thr Arg Ala Val Glu Pro Ala Gln 610 615 620 Gln Tyr Asp Met Phe Asn Ile Tyr Ile Lys Arg Ala Ala Glu Ile Tyr 625 630 635 640 Gly Val Thr His Thr Arg Gly Ile Tyr Gln Lys Ala Ile Glu Val Leu 645 650 655 Ser Asp Glu His Ala Arg Glu Met Cys Leu Arg Phe Ala Asp Met Glu 660 665 670 Cys Lys Leu Gly Glu Ile Asp Arg Ala Arg Ala Ile Tyr Ser Phe Cys 675 680 685 Ser Gln Ile Cys Asp Pro Arg Thr Thr Gly Ala Phe Trp Gln Thr Trp 690 695 700 Lys Asp Phe Glu Val Arg His Gly Asn Glu Asp Thr Ile Lys Glu Met 705 710 715 720 Leu Arg Ile Arg Arg Ser Val Gln Ala Thr Tyr Asn Thr Gln Val Asn 725 730 735 Phe Met Ala Ser Gln Met Leu Lys Val Ser Gly Ser Ala Thr Gly Thr 740 745 750 Val Ser Asp Leu Ala Pro Gly Gln Ser Gly Met Asp Asp Met Lys Leu 755 760 765 Leu Glu Gln Arg Ala Glu Gln Leu Ala Ala Glu Ala Glu Arg Asp Gln 770 775 780 Pro Leu Arg Ala Gln Ser Lys Ile Leu Phe Val Arg Ser Asp Ala Ser 785 790 795 800 Arg Glu Glu Leu Ala Glu Leu Ala Gln Gln Val Asn Pro Glu Glu Ile 805 810 815 Gln Leu Gly Glu Asp Glu Asp Glu Asp Glu Met Asp Leu Glu Pro Asn 820 825 830 Glu Val Arg Leu Glu Gln Gln Ser Val Pro Ala Ala Val Phe Gly Ser 835 840 845 Leu Lys Glu Asp 850 693 amino acids amino acid single linear 154 Met Phe Ser Ala Leu Lys Lys Leu Val Gly Ser Asp Gln Ala Pro Gly 1 5 10 15 Arg Asp Lys Asn Ile Pro Ala Gly Leu Gln Ser Met Asn Gln Ala Leu 20 25 30 Gln Arg Arg Phe Ala Lys Gly Val Gln Tyr Asn Met Lys Ile Val Ile 35 40 45 Arg Gly Asp Arg Asn Thr Gly Lys Thr Ala Leu Trp His Arg Leu Gln 50 55 60 Gly Arg Pro Phe Val Glu Glu Tyr Ile Pro Thr Gln Glu Ile Gln Val 65 70 75 80 Thr Ser Ile His Trp Ser Tyr Lys Thr Thr Asp Asp Ile Val Lys Val 85 90 95 Glu Val Trp Asp Val Val Asp Lys Gly Lys Cys Lys Lys Arg Gly Asp 100 105 110 Gly Leu Lys Met Glu Asn Asp Pro Gln Glu Xaa Glu Ser Glu Met Ala 115 120 125 Leu Asp Ala Glu Phe Leu Asp Val Tyr Lys Asn Cys Asn Gly Val Val 130 135 140 Met Met Phe Asp Ile Thr Lys Gln Trp Thr Phe Asn Tyr Ile Leu Arg 145 150 155 160 Glu Leu Pro Lys Val Pro Thr His Val Pro Val Cys Val Leu Gly Asn 165 170 175 Tyr Arg Asp Met Gly Glu His Arg Val Ile Leu Pro Asp Asp Val Arg 180 185 190 Asp Phe Ile Asp Asn Leu Asp Arg Pro Pro Gly Ser Ser Tyr Phe Arg 195 200 205 Tyr Ala Glu Ser Ser Met Lys Asn Ser Phe Gly Leu Lys Tyr Leu His 210 215 220 Lys Phe Phe Asn Ile Pro Phe Leu Gln Leu Gln Arg Glu Thr Leu Leu 225 230 235 240 Arg Gln Leu Glu Thr Asn Gln Leu Asp Met Asp Ala Thr Leu Glu Glu 245 250 255 Leu Ser Val Gln Gln Glu Thr Glu Asp Gln Asn Tyr Gly Ile Phe Leu 260 265 270 Glu Met Met Glu Ala Arg Ser Arg Gly His Ala Ser Pro Leu Ala Ala 275 280 285 Asn Gly Gln Ser Pro Ser Pro Gly Ser Gln Ser Pro Val Leu Pro Ala 290 295 300 Pro Ala Val Ser Thr Gly Ser Ser Ser Pro Gly Thr Pro Gln Pro Ala 305 310 315 320 Pro Gln Leu Pro Leu Asn Ala Ala Pro Pro Ser Ser Val Pro Pro Val 325 330 335 Pro Pro Ser Glu Ala Leu Pro Pro Pro Ala Cys Pro Ser Ala Pro Ala 340 345 350 Pro Arg Arg Ser Ile Ile Ser Arg Leu Phe Gly Thr Ser Pro Ala Thr 355 360 365 Glu Ala Ala Pro Pro Pro Pro Glu Pro Val Pro Ala Ala Gln Gly Pro 370 375 380 Ala Thr Val Gln Ser Val Glu Asp Phe Val Pro Asp Asp Arg Leu Asp 385 390 395 400 Arg Ser Phe Leu Glu Asp Thr Thr Pro Ala Arg Asp Glu Lys Lys Val 405 410 415 Gly Ala Lys Ala Ala Gln Gln Asp Ser Asp Ser Asp Gly Glu Ala Leu 420 425 430 Gly Gly Asn Pro Met Val Ala Gly Phe Gln Asp Asp Val Asp Leu Glu 435 440 445 Asp Gln Pro Arg Gly Ser Pro Pro Leu Pro Ala Gly Pro Val Pro Ser 450 455 460 Gln Asp Ile Thr Leu Ser Ser Glu Glu Glu Ala Glu Val Ala Ala Pro 465 470 475 480 Thr Lys Gly Pro Ala Pro Ala Pro Gln Gln Cys Ser Glu Pro Glu Thr 485 490 495 Lys Trp Ser Ser Ile Pro Ala Ser Lys Pro Arg Arg Gly Thr Ala Pro 500 505 510 Thr Arg Thr Ala Ala Pro Pro Trp Pro Gly Gly Val Ser Val Arg Thr 515 520 525 Gly Pro Glu Lys Arg Ser Ser Thr Arg Pro Pro Ala Glu Met Glu Pro 530 535 540 Gly Lys Gly Glu Gln Ala Ser Ser Ser Glu Ser Asp Pro Glu Gly Pro 545 550 555 560 Ile Ala Ala Gln Met Leu Ser Phe Val Met Asp Asp Pro Asp Phe Glu 565 570 575 Ser Glu Gly Ser Asp Thr Gln Arg Arg Ala Asp Asp Phe Pro Val Arg 580 585 590 Asp Asp Pro Ser Asp Val Thr Asp Glu Asp Glu Gly Pro Ala Glu Pro 595 600 605 Pro Pro Pro Pro Lys Leu Pro Leu Pro Ala Phe Arg Leu Lys Asn Asp 610 615 620 Ser Asp Leu Phe Gly Leu Gly Leu Glu Glu Ala Gly Pro Lys Glu Ser 625 630 635 640 Ser Glu Glu Gly Lys Glu Gly Lys Thr Pro Ser Lys Glu Lys Lys Lys 645 650 655 Lys Thr Lys Ser Phe Ser Arg Val Leu Leu Glu Arg Pro Arg Ala His 660 665 670 Arg Phe Ser Thr Arg Val Gly Tyr Gln Val Ser Val Pro Asn Ser Pro 675 680 685 Tyr Ser Glu Ser Tyr 690 

What is claimed is:
 1. A method for selecting a functional nucleic acid molecule functioning in a cell, cytoplasm or nucleus, which comprises: constructing an expression vector which comprises, under the control of a promoter, a candidate nucleic acid sequence comprising a randomized sequence in a portion of a nucleic acid molecule or in another nucleic acid ligated to said nucleic acid molecule; introducing said expression vector into a cell; culturing said cell to express said nucleic acid molecule; collecting and destroying said cell to prepare an extract of an entire cell, a cytoplasm fraction or a nucleus fraction; incubating said extract of an entire cell, said cytoplasm fraction or said nucleus fraction for a certain period of time; and obtaining a nucleic acid molecule remaining in said extract of an entire cell, said cytoplasm fraction or said nucleus fraction.
 2. The method according to claim 1, which further comprises repeating the same cycle one or more times for the obtained functional nucleic acid molecule, provided that the period of time for incubating said extract of an entire cell, said cytoplasm fraction or said nucleus fraction is longer than that in an immediately preceding cycle.
 3. The method according to claim 1, wherein the nucleic acid molecule is determined for its stability or transcriptional level, using an increase in amount of the existing nucleic acid molecule as an indicator.
 4. The method according to claim 1, wherein the nucleic acid molecule is determined for its function, using activity of the nucleic acid molecule as an indicator.
 5. The method according to claim 1, wherein the nucleic acid molecule is determined for its intracellular localization, using the existence of the nucleic acid molecule in said cytoplasm fraction or said nucleus fraction as an indicator.
 6. The method according to claim 1, wherein a linker exists in between the promoter and the nucleic acid molecule, and the linker is randomized.
 7. The method according to claim 1, wherein the nucleic acid molecule is selected from the group consisting of an antisense RNA, an antisense DNA, an aptamer and a DNA-RNA hybrid.
 8. The method according to claim 1, wherein the nucleic acid molecule is a nucleic acid enzyme.
 9. The method according to claim 8, wherein the nucleic acid enzyme is a ribozyme.
 10. The method according to claim 9, wherein the ribozyme is a hammerhead ribozyme.
 11. A novel functional nucleic acid molecule selected by the method according to claim 1, wherein the functional nucleic acid molecule has an increase transcriptional level, stability or activity within cells or exhibits altered intracellular localization, when compared to a corresponding control nucleic acid molecule.
 12. The functional nucleic acid molecule according to claim 11, which is a ribozyme.
 13. The functional nucleic acid molecule according to claim 12, wherein said ribozyme is located in cytoplasm and has high stability and high activity. 